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CYSTIC FIBROSIS GENE 

This invention was made with government support under 
Grants R01 DK39690-02 and DK34944 awarded by the United 
States National Institutes of Health. The government 
5 has certain rights in the invention. 
FJKTJ) OF THE INVENTION 

The present invention relates generally to the 
cystic fibrosis (CF) gene, and, more particularly to the 
identification, isolation and cloning of the DNA 

10 sequence corresponding to the normal and mutant CF 

genes, as well as their transcripts and gene products. 
The present invention also relates to methods of 
screening for and detection of CF carriers, CF 
diagnosis, prenatal CF screening and diagnosis, and gene 

15 therapy utilizing recombinant technologies and drug 
therapy using the information derived from the DNA, 
protein, and the metabolic function of the protein. 

BACKGROUND OF THE INVENTION 

Cystic fibrosis (CF) is the most common severe 
20 autosomal recessive genetic disorder in the Caucasian 
population. It affects approximately 1 in 2000 live 
births in North America [Boat et al, The Metabolic 
Basts of Inherited Disease . 6th ed, pp 2649-2680, McGraw 

Hill, NY (1989)]. Approximately 1 in 20 persons^ are 

25 carriers of the disease. 

Although the disease was first described in the 
late 1930's, the basic defect remains unknown. The 
major symptoms of cystic fibrosis include chronic 
pulmonary disease, pancreatic exocrine insufficiency, 

30 and elevated sweat electrolyte levels. The symptoms are 
consistent with cystic fibrosis being an exocrine 
'disorder. Although recent advances have been made in 
the analysis of ion transport across the apical membrane 
of the epithelium of CF patient cells, it is not clear 

35 that the abnormal regulation of chloride channels 

represents the primary defect in the disease. Given the 
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lack of understanding of the molecular mechanism of the 
disease, an alternative approach has therefore been 
taken in an attempt to understand the nature of the 
molecular defect through direct cloning of the 
5 responsible gene on the basis of its chromosomal 
location. 

However, there is no clear phenotype that directs 
an approach to the exact nature of the genetic basis of 
the disease, or that allows for an identification of the 

10 cystic fibrosis gene. The nature of the CF defect in 
relation to the population genetics data has not been 
readily apparent. Both the prevalence of the disease 
and the clinical heterogeneity have been explained by 
several different mechanisms: high mutation rate, 

15 heterozygote advantage, genetic drift, multiple loci, 
and reproductive compensation. 

Many of the hypotheses can not be tested due to the 
lack of knowledge of the basic defect. Therefore, 
alternative approaches to the determination and 
20 characterization of the CF gene have focussed on an 

attempt to identify the location of the gene by genetic 
analysis. 

Linkage analysis of the CF gene to antigenic and 
protein markers was attempted in the 1950 's, but na 

25 positive results were obtained [Steinberg et al Am. J. 
Hum. Genet. 8; 162-176, (1956); Steinberg and Morton Am. 
J i HUB, genet 8: 177-189, (1956); Goodchild et al 
Med. Genet. 7: 417-419, 1976. 

More recently, it has become possible to use 

30 RFLP's to facilitate linkage analysis. The first 

linkage of an RFLP marker to the CF gene was disclosed 
in 1985 [Tsui et al. Science 230: 1054-1057, 1985] in 
which linkage was found between the CF gene and an 
uncharacterized marker DOCRI-917. The association was 

35 found in an analysis of 39 families with affected CF 
children. This showed that although the chromosomal 
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location had not been established, the location of the 
disease gene had been narrowed to about 1% of the human 
genome, or about 30 million nucleoside base pairs. 

The chromosomal location of the D0CRI-9 17. probe was 
5 established using rodent-human hybrid cell lines 

containing different human chromosome complements, it 
was shown that DOCR1-917 (and therefore the CF gene) 
maps to human chromosome 7. 

Further physical and genetic linkage studies were 
10 pursued in an attempt to pinpoint the location of the CF 
gene. Zengerling et al fAm. J. Hum, flan^, 40: 228-236 
(1987) ] describe the use of human-mouse somatic cell 
hybrids to obtain a more detailed physical relationship 
between the CF gene and the markers known to be linked 
15 with it. This publication shows that the CF fl ^-^ap~js»- 
assigned to either the distal region of band cd fe or tha__ 
proximal region of band g31 on chromosome 7. 

Rommens et al fAa. J. Hun. Ge n ^ t 43: 645-663, 
(1988)3 give a detailed discussion of the isolation of 
20 many new 7g31 probes. The approach outlined led to the 
isolation of two new probes, D7S122 and D7S340, which 
are close to each other. Pulsed field gel 
electrophoresis mapping indicates that these two RFLP 
markers are between two markers known to flank the CF 
25 gene, MET [White, R. , Woodward S., Leppert M. , et al. 
liaiaig 318: 382-384, (1985)] and D7S8 [Wainwright, B. 
J., Scambler, P. J., and J. Schmidtke, Nature 318: 384- 
385 (1985)], therefore in the CF gene region. The 
discovery of these markers provides a starting point for 
30 chromosome walking and jumping. 

Estivill et al, f Nature 326: 840-845(1987)] 
disclose that a candidate cDNA gene was located and 
partially characterized. This however, does not teach 
the correct location of the CF gene. The reference 
35 discloses a candidate cDNA gene downstream of a CpG 
island, which are undermethylated GC nucleotide-rich 
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regions upstream of many vertebrate genes. The 
chromosomal localization of the candidate locus is 
identified as the XV2C region. This region is described 
in European Patent Application 88303645.1. However, 
5 that actual region does not include the CF gene. 

A major difficulty in identifying the CF gene has 
been the lack of cytologically detectable chromosome 
rearrangements or deletions, which greatly facilitated 
all previous successes in the cloning of human disease 

10 genes by knowledge of map position. 

Such rearrangements and deletions could be observed 
cytologically and as a result, a physical location on a 
particular chromosome could be correlated with the 
particular disease. Further, this cytological location 

15 could be correlated with a molecular location based on 
known relationship between publicly available ;DNA probes 
and cytologically visible alterations in the 
chromosomes. Knowledge of the molecular location of the 
gene for a particular disease would allow cloning and 

20 sequencing of that gene by routine procedures, 

particularly when the gene product is known and cloning 
success can be confirmed by immunoassay of expression 
products of the cloned genes. 

In contrast, neither the cytological location; nor 

25 the gene product of the gene for cystic fibrosis was 

known in the prior art. With the recent identification 
of MET and D7S8, markers which flanked the CF gene but 
did not pinpoint its molecular location, the present 
inventors devised various novel gene cloning strategies 

30 to approach the CF gene in accordance with the present 
invention. The methods employed in these strategies 
include chromosome jumping from the flanking markers, 
cloning of DNA fragments from a defined physical region 
with the use of pulsed field gel electrophoresis, a 

35 combination of somatic cell hybrid and molecular cloning 
techniques designed to isolate DNA fragments from 
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undermethylated cpG islands near CP, chromosome 
microdissection and cloning, and saturation cloning of 
large number of DNA markers from the 7q31 region. By 
means of these novel strategies, the present inventors 
5 were able to identify the gene responsible for cystic 
fibrosis where the prior art was uncertain or, even in 
one case, wrong. 

The application of these genetic and molecular 
cloning strategies has allowed the isolation and cDNA - 
10 cloning of the cystic fibrosis gene on the basis of its 
chromosomal location, without the benefit of genomic 
rearrangements to point the way. The identification of 
the normal and mutant forms of the CP gene and gene 
products has allowed for the development of screening 
15 and diagnostic tests for CP utilizing nucleic acid 
probes and antibodies to the gene product. Through 
interaction with the defective gene product and the 
pathway in which this gene product is involved, therapy 
through normal gene product supplementation and gene 
manipulation and delivery are now made possible. 
SUMMARY OF THE INVENTTOW 

The gene involved in the cystic fibrosis disease 
process, hereinafter the "CP gene" and its functional 
equivalents, has been identified, isolated and cDNA 
cloned, and its transcripts -arid gene products identified 
and sequenced. A three base pair deletion leading to 
the omission of a phenylalanine residue in the gene 
product has been determined to correspond to the 
nutations of the CF gene in approximately 70% of the 
30 patients affected with CF, with different mutations 
involved in most if not all the remaining cases. 

With the identification and sequencing of the gene 
and its gene product, nucleic acid probes and antibodies 
raised to the gene product can be used in a variety of 
35 hybridization and immunological assays to screen for and 
detect the presence of either a normal or a defective CF 
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gene or gene product. Assay kits for such screening and 
diagnosis can also be provided. 

Patient therapy through supplementation with the 
normal gene product, whose production can be amplified 
5 using genetic and recombinant techniques, or its 

functional equivalent, is now also possible. Correction 
or modification of the defective gene product through 
drug treatment means is now possible, in addition, 
cystic fibrosis can be cured or controlled through gene 
10 therapy by correcting the gene defect In situ or using 
recombinant or other vehicles to deliver a DNA sequence 
capable of expression of the normal gene product to the 
cells of the patient. 

According to an aspect of the invention, a DNA 
15 molecule comprises a DNA sequence selected from the 
group consisting of: 

(a) DNA sequences which correspond to the DNA 
sequence as set forth in the following Figure 1 from 
amino acid residue position 1 to position 1480; 
20 < b ) DNA sequences encoding normal CFTR polypeptide 

having the sequence according to the following Figure l 
for amino acid residue positions from l to 1480; 

(c) DNA sequences which correspond to a fragment 
of the sequence of the following Figure 1 including at 

25 least 16 sequential nucleotides between amino acid 
residue positions l and 1480; 

(d) DNA sequences which comprise at least 16 
nucleotides and encode a fragment of the amino acid 
sequence of the following Figure 1; and 

30 («) DI *A sequences encoding an epitope encoded by 

at least 18 sequential nucleotides in the sequence of 
the following. Figure 1 between amino acid residue 
positions 1 and 1480. 

According to another aspect of the invention, a 

35 purified mutant CF gene comprises a DNA sequence 

encoding an amino acid sequence for a protein where the 
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protein, when expressed in cells of the human body, is 
associated with altered cell function which correlates 
with the genetic disease cystic fibrosis. 

According to another aspect of the invention, a 
5 purified RNA molecule comprises an RNA sequence 
corresponding to the above DNA sequence. 

According to another aspect of the invention, a DNA 
molecule comprises a cDNA molecule corresponding to the 
above DNA sequence. 

According to another aspect of the invention, a 
purified nucleic acid probe comprises a DNA or RNA 
nucleotide sequence corresponding to the above noted 
selected DNA sequences of groups (a) to (e) . 

According to another aspect of the invention, a 
DNA molecule comprises a DNA sequence encoding mutant 
CFTR polypeptide having the sequence according to the 
following Figure 1 for amino acid residue positions 1 to 
1480. The sequence is further characterized by a three 
base pair mutation which results in the deletion of 
phenylalanine from amino acid residue position 508. 

According to another aspect of the invention, a DNA 
molecule comprises a cDNA molecule corresponding to the 
above DNA sequence. 

According to another aspect of the invention, the 
cDNA molecule comprises a DNA sequence selected ! from the 
group consisting of: 

(a) TlHX sequences which correspond to the mutant 
DNA sequence and which encode, on expression, for mutant 
CFTR polypeptide ; 
30 (b) DNA sequences which correspond to a fragment 

of the mutant DNA sequences, including at least twenty 
nucleotides; 

(c) DNA sequences which comprise at least twenty 
nucleotides and encode a fragment of the mutant CFTR 
35 protein amino acid sequence; and 
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(d) DNA sequences encoding an epitope encoded by 
at least eighteen sequential nucleotides in the mutant 
DNA sequence. 

According to another aspect 'or the invention, 
purified RNA molecule comprising SNA sequence 
corresponds to the mutant DNA sequence. 

A purified nucleic acid probe comprising a DNA or 
RNA nucleotide sequence corresponding to the mutant 
sequences as recited above. 

According to another aspect of the invention, a 
recombinant cloning vector comprising the DNA sequences 
of the normal or mutant DNA and fragments thereof is 
provided. The vector, according to an aspect of this 
invention, is operatively linked to an expression 
control sequence in the recombinant DNA molecule so 
that the normal CFTR protein can be expressed, or 
alternatively with the other selected mutant DNA 
sequence the mutant CFTR polypeptide can be expressed. 
The expression control sequence is selected from the 
20 group consisting of sequences that control the 

expression of genes of prokaryotic or eukaryotic cells 
and their viruses and combinations thereof. 

According to another aspect of the invention, a 
method for producing normal CFTR polypeptide comprises 
25 the steps of: 

(a) culturing a host cell transfected with the 
recombinant vector for the normal DNA sequence in a 
nediua and under conditions favorable for expression of 
th« normal CFTR polypeptide; and 

30 ( b ) isolating the expressed normal CFTR 

polypeptide. 

According to another aspect of the invention, a 
method for producing a mutant CFTR polypeptide comprises 
the steps of: 

35 < a > culturing a host cell transfected with the 

recombinant vector for the mutant DNA sequence in a 



WO 91/02796 g PCT/CA90/00267 

medium and under conditions favorable for expression of 
the mutant CFTF polypeptide; and 

(b) isolating the expressed mutant CFTR 
polypeptide. 

5 According to another aspect of the invention, a 

purified protein of human cell membrane origin comprises 
an ammo sequence encoded by the mutant DNA sequence " 
where the protein, when present in human cell membrane 
is associated with cell function which causes the 
10 genetic disease cystic fibrosis. 

rm Accordin * to another aspect of the invention, the 
CFTR polypeptide is characterized by a molecular weight 
of about 170,000 daltons and an epithelial cell 
transmembrane ion conductance affecting activity 
15 According to another aspect of the invention, a 

substantially pure CFTR protein normally expressed in 
human epithelial cells and characterized by being 
capable of participating in regulation and in control of 

20 TiJZTT* thr ° U9h epithelial c *"» »Y Ending to 
20 epithelial cell aembrane to modulate ion movement 

through channels formed in the epithelial cell membrane 

According to another aspect of the invention, a 

process for isolating the CFTR protein comprises: 

(a) extracting peripheral proteins from membranes 
In?" 1 ?" 41 -U- to provide membrane materia* havL" 
integral proteins including said CFTR protein; 

(b) solubilizing said integral proteins of said 
membrane material to form a solution of said integral 
proteins ; 

30 (c) separating said CFTR protein to remove any 

remaining other proteins of mammalian origin. 

According to another aspect of the invention, a 

ITT iS J rOVided for a subject to determine 

if the subject is a CF carrier or a CF patient 
35 comprising the steps of providing a biological sample of 
the subject to be screened and providing an assay for 
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detecting in the biological sample, the presence of at 
least a member from the group consisting of the normal 
CP gene, normal CF gene products, a mutant CF gene, 

mutant CF gene products and mixtures thereof. 
5 According to another aspect of the invention, an 

immunologically active anti-CFTR polyclonal or 

monoclonal antibody specific for CFTR polypeptide is 

provided. 

According to another aspect of the invention, a kit 
10 for assaying for the presence of a CF gene by 
immunoassay techniques comprises: 

(a) an antibody which specifically binds to a gene 
product of the CF gene; 

(b) reagent means for detecting the binding of the 
15 antibody to the gene product; and 

(c) the antibody and reagent means each being 
present in amounts effective to perform the immunoassay. 

According to another aspect of the invention, a kit 
for assaying for the presence of a CF gene by 
20 hybridization technique comprises: 

(a) an oligonucleotide probe which specifically 
binds to the CF gene; 

(b) reagent means for detecting the hybridization 
of the oligonucleotide probe to the CF gene; and 

25 (c) the probe and reagent means each being present 

in amounts effective to perform the hybridization assay. 

According to another aspect of the invention, a 
method is provided for treatment for cystic fibrosis in 
a patient. The treatment comprises the step of 
administering to the patient a therapeutically effective 
amount of the normal CFTR protein. 

According to another aspect of the invention, a 
method of gene therapy for cystic fibrosis comprises the 
step of delivery of a DNA molecule which includes a 
sequence corresponding to the normal DNA sequence 
encoding for normal CFTR protein. 



30 



35 



WO 91/02796 PCT/CA90/00267 

11 

tniJT 1 "? *° an0ther " PeCt ° f "» i- en tion, an 
animal comprises an heterologous cell system. ^ cell 

system includes a recombinant clo^ng v . ot o r whU 
includes the recombinant DMA sequence corresponding to 
5 the mutant DNA seguence which induce, cystic fibrosis 
symptoms in the animal. *">rosis _ 

tra„sa!„7 dinS ^ 8n ° ther ° f th « invention, a 

transgenic mouse exhibits cystic fibrosis symptoms. 

" and tH^': 1 iS , tha nuol «°* 1 <'« »«*»nc. of the CP gen. 
«nd the amino acid sequence of the cm protein. 

the Zt^'J l " * restriotion »»P ot the CP ,.n. , nd 

the schematic strategy used *« ..i™ 

to the gene. chromosome walk and jump 

" of 3 f 3 PUl " d -" eW -*-l -Lctrophoresis map 

=f the region induding and surrounding the CP gene. 

Figures 4A, 4B and 4C show the detection of 
conserved nucleotide sequences by cross-specie, 
hybridization. 

=° Figure 4D is, restriction map of overlapping 

segments of probes E4.3 and HI. s. 

Figure 5 is an RNA blot hybridization analysis 

25 T IT' I 4 (n0nMl "* C "< "ver, 

25 HMO, T84, and brain rna is shown. 

doner 9 "! 6 ^ MthVl » tion »*«u. of th. E4. 3 
cloned region at the 5' end of th. CP ,.„.. 

showing' V re8trlCtlon -P °< the cm cDNA 
- fra^ ^ ° f ^ CD " A t0 *- — « 

"9"" 8 i« «n RNA gel blot analyi. depicting 
hybridation by a portion of th. cm cDNA (c^eVi, 
t= . ... Kb .rna tra^cript in v , r iou. human tissue. 
de„, J !!".!L" ' DN * bl0t ^rldization analysis 



^ • . . -i^^iAation analysis 

depicting hybridization by th. cm cDNA clones to 
genomic DNA digested with EcoRI and Hind III. 
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Figure 10 , is a primer extension experiment 
characterizing the 5' and 3' ends of the CFTR cDNA. 

Figure 11 is a hydropathy proYile and shows 
predicted secondary structures of CFTR. 
5 Figure 12 is a dot matrix analysis of internal 

homologies in the predicted CFTR polypeptide. 

Figure 13 is a schematic model of the predicted 
CFTR protein. 

Figure 14 is a schematic diagram of the restriction 
10 fragment length polymorphisms (RFLP's) closely linked 

to the CF gene where the inverted triangle indicates the 
location of the F508 3 base pair deletion. 

Figure 15 represents the detection of the F508 
mutation by oligonucleotide hybridization with Probe N 
15 detecting the normal sequence and Probe F detecting the 
CF mutant sequence. 

Figure 16 represents alignment of the most 
conserved segments of the extended NBFs of CFTR with 
comparable regions of other proteins. 
20 Figure 17 is the DNA sequence around the F508 

deletion. 

Figure 18 is a representation of the nucleotide 
sequencing gel showing the DNA sequence at the F508 
deletion. 

25 Figures 19a and 19b are Coomassie Blue. stained 

polyacrylamide gels following electrophoresis of protein 
from bacterial ly sates (JK 101) which bacteria was 
transformed with the pGEX plasmids. 

Figure 20 are immunoblots of bacterial lysates 
30 containing fusion protein #1 (on Table 8) with pre immune 
and immune sera from two different rabbits. 

Figure 21 is an immunoblot of T-84 membranes using 
immune serum from rabbit #1 of Figure 20. 

Figure 22 are immunodot blots probed with preimmune 
35 and immune sera from a rabbit immunized with the KLH 
conjugate of peptide #2 of -Table 8. 
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TOgTATUSD DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Ij. DEFINITIONS 

In order to facilitate review of the various 
embodiments of the invention and an understanding of 
5 various elements and constituents used in making the 
invention and using same, the following definition of 
terms used in the invention description is as .follows: 
CF - cystic fibrosis 

CF carrier - a person in apparent health whose 

10 chromosomes contain a mutant CF gene that may be 
transmitted to that person's offspring. 

CF patient - a person who carries a mutant CF gene 
on each chromosome , such that they exhibit the clinical 
symptoms of cystic fibrosis. 

15 CF gene - the gene whose mutant forms are 

associated with. the disease cystic fibrosis. This 
definition is understood to include the various sequence 
polymorphisms that exist, wherein nucleotide 
substitutions in the gene sequence do not affect the 

20 essential function of the gene product. This term 

primarily relates to an isolated coding sequence, but 
can also include some or all of the flanking regulatory 
elements and/or introns. 

CF - PI - cystic fibrosis pancreatic insufficient, 

25 the major clinical subgroup of cystic fibrosis patients, 
characterized by insufficient pancreatic exocrine 
function. 

CF - PS - cystic fibrosis pancreatic sufficient, a 
clinical subgroup of cystic fibrosis patients with 
30 sufficient pancreatic exocrine function for normal 
digestion of food. 

CFTR - cystic fibrosis transmembrane conductance 
regulator protein, encoded by the CF gene. This 
definition includes the protein as isolated from human 
35 or animal sources, as produced by recombinant organisms, 
and as chemically or enzymatically synthesized. This 
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oiivttrtf J-Ann jL<* -understood ±o .include .the .various 
polymorphic forms of the protein wherein amino acid 
substitutions in the variable regions of the sequence 
does not affect the essential functioning of the 
5 protein, or its hydropathic profile or secondary or 
tertiary structure. 

DNA - standard nomenclature is used to identify the 
bases. 

Intronless ONA - a piece of DNA lacking internal 
10 non-coding segments , for example, cDNA. 

IRP locus sequence - (protooncogene int-1 related) , 
a gene located near the CF gene. 

Mutant CFTR - a protein that is highly analagous to 
CFTR in terms of primary, secondary, and tertiary 
15 structure, but wherein a small number of amino acid 

substitutions and/or deletions and/or insertions result 
in impairment of its essential function, so that 
organisms whose epithelial cells express mutant CFTR 
rather than CFTR demonstrate the symptoms of cystic 
20 fibrosis. 

mCF - a mouse gene orthologous to the human CF gene 
NBFs - nucleotide (ATP) binding folds 
ORF - open reading frame 
PCR - polymerase chain reaction 
25 Protein - standard single letter nomenclature is 

used to identify the amino acids 

R-domain - a highly charged cytoplasmic domain of 
thf CFTR protein 

RSV - Rous Sarcoma Virus 
30 SAP - surfactant protein 

RFLP - restriction fragment length polymorphism 

2j. TSOTATTNG THE CF GENE 

Using chromosome walking, j lamping, and cDNA 
hybridization, DNA sequences encompassing > 500 kilobase 
35 pairs (kb) have been isolated from a region on the long 
arm of human chromosome 7 containing the cystic 
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fibrosis (CF) gene* Several transcribed sequences and 
conserved segments have been identified in this region. 
One of these corresponds to the CF.aene and spans 
approximately 250 kb of genomic DNA. Overlapping 
5 complementary DNA (cDNA) clones have been isolated from 
epithelial cell libraries with a genomic DNA segment 
containing a portion of the cystic fibrosis gene. The 
nucleotide sequence of the isolated cDNA is shown in 
Figure 1. In each row of the respective sequences the 
10 lower row is a list by standard nomenclature of the 

nucleotide sequence. The upper row in each respective 
row of sequences is standard single letter nomenclature 
for the amino acid corresponding to the respective 
codon. 

15 Accordingly, the invention provides a cDNA molecule 

comprising a DNA sequence selected from the group 
consisting of: 

(a) DNA sequences which correspond to the DNA 
sequence of Figure 1 from amino acid residue position 1 

20 to position 1480; 

(b) DNA sequences encoding normal CFTR polypeptide 
having the sequence according to Figure 1 for amino acid 
residue positions from 1 to 1480; 

(c) DNA sequences which correspond to a fragment 
25 of the sequence of Figure 1 including at least 16 

sequential nucleotides between amino acid residue 
positions 1 and 1480; 

(d) DNA sequences which comprise at least 16 
nucleotides and encode a fragement of the amino acid 

30 sequence of Figure 1; and 

(e) DNA sequences encoding an epitope encoded by 
at least 18 sequential nucleotides in the sequence of 
Figure 1 between amino acid residue positions 1 and 
1480. 

35 The invention also provides a cDNA molecule. 
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comprising a DNA sequence selected from the group 
consisting of: 

a) DNA sequences which correspond to the DNA 
sequence encoding mutant CFTR polypeptide characterized 
5 by cystic fibrosis-associated activity in human 

epithelial cells, or the DNA sequence of Figure 1 for 
the amino acid residue positions l to 1480 yet further 
characterized by a three base pair mutation which 
results in the deletion of phenylalanine from amino acid 
10 residue position 508; 

. b) DNA sequences which correspond to fragments of 
the sequences of paragraph a) and which include at least 
sixteen nucleotides; 

c) DNA sequences which comprise at least sixteen 
15 nucleotides and encode a fragment of the amino acid 

sequence encoded for by the DNA sequences of paragraph 
a) ; and 

d) DNA sequences encoding an epitope encoded by 
at least 18 sequential nucleotides in the sequence of 

20 the DNA of paragraph a) . 

Transcripts of approximately 6,500 nucleotides in 
size are detectable in tissues affected in patients with 
CF. Based upon the isolated nucleotide sequence, the 
predicted protein consists of two similar regions, each 
containing a first domain having properties consistent 
with membrane association and a second domain believed 
to be involved in ATP binding. 

A 3 bp deletion which results in the omission of a 
phenylalanine residue at the center of the first 
predicted nucleotide binding domain (amino acid position 
508 of the CF gene product) has been detected in CF 
patients. This mutation in the normal DNA sequence of 
Figure l corresponds to approximately 70% of the 
mutations in cystic fibrosis patients. Extended 
haplotype data based on DNA markers closely linked to 
the putative disease gene suggest that the remainder of 
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the CF mutant gene pool consists of multiple, different 
mutations. A small set of these latter mutant alleles 
(approximately 8%) may confer residual pancreatic 
exocrine function in a subgroup of patients who are 
5 pancreatic sufficient. 

1*1 CHROMOSOME WALKING AN D JUMPING 

Large amounts of the DNA surrounding the D7S122 and 
D75340 linkage regions of Rommens et al supra were 
searched for candidate gene sequences. In addition to - 

10 conventional chromosome walking methods, chromosome 
jumping techniques were employed to accelerate the 
search process. From each jump endpoint a new 
bidirectional walk could be initiated. Sequential walks 
halted by "unclonable" regions often encountered in the 

15 mammalian genome could be circumvented by chromosome 
jumping. 

The chromosome jumping library used has been 
described previously [Collins et al, Science 235, 1046 
(1987); Ianuzzi et al, Am. J. Hu m. Genet. 44, 695 

20 (1989)]. The original library was prepared from a 
preparative pulsed field gel, and was intended to 
contain partial EcoRl fragments of 70 - 130 kb; 
subsequent experience with this library indicates that 
smaller fragments were also represented, and jumps izes 

25 of 25 - no kb have been found. The library was plated 
on sup" host MC1061 and screened by standard* techniques , 
[ Mania t is et al]. Positive clones were subcloned into 
. pBRA23Ava and the beginning and end of the jump 

identified by EcoRl and Ava 1 digestion, as described in 

30 Collins, <?gn<?ffl$ ?nalVgjgs A practical approach (XRL, 

London, 1988), pp. 73-94) . For each clone, a fragment 
from the end of the jump was checked to confirm its 
location on chromosome 7. The contiguous chromosome 
region covered by chromosome walking and jumping was 

35 about 250 kb. Direction of the jumps was biased, by 

careful choice of probes, as described by Collins et al 
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and Ianuzzi et al, supra . The entire region cloned, 
including the sequences isolated with the use of the CF 
gene cDNA, is approximately 500 kb. 

The schematic representation of the chromosome 
5 walking and jumping strategy is illustrated in Figure 2. 
CF gene exons are indicated by Roman numerals in this 
Figure. Horizontal lines above the map indicate walk 
steps whereas the arcs above the map indicate jump 
steps. The Figure proceeds from left to right in each 
10 of six tiers with the direction of ends toward 7cen and 
7gter as indicated. The restriction map for the enzymes 
EcoRI, Hindlll, and BamHI is shown above the solid line, 
spanning the entire cloned region. Restriction sites 
indicated with arrows rather than vertical lines 
15 indicate sites which have not been unequivocally 

positioned. Additional restriction sites for other 
enzymes are shown below the line. Gaps in the cloned 
region are indicated by | | . These occur only in the 
portion detected by cDNA clones of the CF transcript. 
20 These gaps are unlikely to be large based on pulsed 
field mapping of the region. The walking clones, as 
indicated by horizontal arrows' above the map, have the 
direction of the arrow indicating the walking progress 
obtained with each clone. Cosmid clones begin with the 
25 letter c; all other clones are phage. Cosmid CF26 

proved to be a chimera; the dashed portion is derived 
from a different genomic fragment on another chromosome. 
Roman numerals I through XXIV indicate the location of 
exons of the CF gene. The horizontal boxes shown above 
30 the line are probes used during the experiments. Three 
of the probes represent independent subcloning of 
fragments previously identified to detect polymorphisms 
in this region: H2.3A corresponds to probe XV2C (X. 
Estivill et al, fiaturfi, 326: 840 (1987)), probe El ' 
35 corresponds to KM 19 (Estivill, supra), and probe. E4.1 
corresponds to Mp6d.9 (X. Estivill et al. Am . J, Hum . 
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SaUSS*. 44, 704 (1989)). G-2 is a subfragment of E6 
which detects a transcribed sequence, ri..i, ri 59/ and 
R160 are synthetic oligonucleotides constructed from 
parts of the IRP locus sequence [b\"j. Wainwright et al, 
5 EMBQ Ji , 7: 1743 (1988)], indicating the location of 
this transcript on the genomic map. 

As the two independently isolated DNA markers, 
D7S122 (pH131) and D7S340 (TM58) , were only 
approximately 10 kb apart (Figure 2), the walks and 
Dumps were essentially initiated from a single point. 
The direction of walking and jumping with respect to MET 
and D7S8 was then established with the crossing of 
several rare-cutting restriction endonuclease 
recognition sites (such as those for Xho I, Nru I and 
Not I, see Figure 2) and with reference to the long 
range physical map of J. M. Rommens et al. Am. J. W „„. 
SSDSta., in press; A. M. Poustka, et al, Ssnsmisz 2, 337 
(1988); M. L. Drumm et al. fisnojjlsa 2 , 346 (1988). 
The pulsed field mapping data also revealed that the 
Not I site identified by the inventors of the present 
invention (see Figure 2, position 113 kb) corresponded 
to the one previously found associated with the IRP 
locus (Estivill et al 1987, supra), since subsequent 
genetic studies showed that CF was most likely located 
25 between IRP and D7S8 (M. Farrall et al, Am. J. h,™. 
SSDfit, 43, 471 (1988), B.-S. Kerem et al. Am: j. 
fisnstx 44, 827 (1989)], the walking and jumping effort 
was continued exclusively towards cloning of this 
interval. It is appreciated, however, that other coding 
regions, as identified in Figure 2, for example, G-2, 
CF14 and CF16, were located and extensively 
investigated. Such extensive investigations of these 
other regions revealed that they were not the CF gene 
based on genetic data and sequence analysis. Given the 
lack of knowledge of the location of the CF gene and its 
characteristics, the extensive and time consuming 
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examination of the nearby presumptive coding regions did 
not advance the direction of search for the CF gene. 
However , these investigations were necessary in order to 
rule out the possibility of the CF gene being in those 
5 regions. 

Three regions in the 280 Jcb segment were found not 
to be readily recoverable in the amplified genomic 
libraries initially used. These less clonable regions 
were located near the DNA segments H2.3A and X.6, and 

10 just beyond cosmid cW44, at positions 75-100 kb, 205-225 
kb, and 275-285 kb in Figure 2, respectively. The 
recombinant clones near H2.3A were found to be very 
unstable with dramatic rearrangements after only a few 
passages of bacterial culture. To fill in the resulting 

15 gaps, primary walking libraries were constructed using 
special host-vector systems which have been reported to 
allow propagation of unstable sequences [A. R. Wyman, L. 
B. Wolfe, D. Botstein, Proc, Nat- Acad, Sci. U. S. A. 
82, 2880 (1985); K. F. Wertman, A. R* Wyman, D. 

20 Botstein, Gene 49, 253 (1986); A. R. Wyman, K. F. 

Wertman, D. Barker, C. Helms, W. H. Petri, Gene . 49, 263 
(1986)]. Although the region near cosmid cW44 remains 
to be recovered, the region near X.6 was successfully 
rescued with these libraries. 

25 2*2. CONSTRUCTION OF GENOMIC LIBRARIES 

Genomic libraries were constructed after 
procedures described in Hanatis, et al, M° lecular 
Cloning; A Laboratory Manual (Cold Spring Harbor 
Laboratory, Cold Spring Harbor, New York 1982) and are 

30 listed in Table 1. This includes eight phage 

libraries, one of which was provided by T. Maniatis 
[Fritsch et al, fisll, 19:959 (1980)]; the rest were 
constructed as pari: of this work according to procedures 
described in Maniatis et al, sUEia. Four phage 

35 libraries were cloned in A DASH (commercially available 
from Stratagene) and three in AFIX (commercially 



WO 91/02796 



21 



PCI7CA90/00267 



available from Stratagene) , with vector arms provided by 
the manufacturer. One A DASH library was constructed 
from Sau 3A-partially digested DNA from a human-hamster 
hybrid containing human chromosome 7 (4AF/102/K015) 
5 [Rommens et al Am. J. Hum. Genef: 43, 4 (1988)], and 
other libraries from partial Sau3A, total BamHI, or 
total EcoRI digestion of human peripheral blood or 
lymphoblastoid DNA. To avoid loss of unstable 
sequences, five of the phage libraries were propagated - 
10 on the recombination-deficient hosts DB1316 (recD") , CES 
200 (recBC") [Wyman et al, sjjpja , Wertman et al supra . 
Wyman et al sjjpiaj ; or TAP90 [Patterson et al Nucleic 
Acjfls Res, 15:6298 (1987)]. Three cosmid libraries were 
then constructed. In one the vector pCV108 [Lau et al 
15 Proc. Natl. Acad, sci n.«^ 80:5225 (1983)] was used to 
clone partially digested (Sau 3A) DNA from 4AF/102/K015 
[Rommens et al An. J. Hum. Genet. 43:4 (1988)]. A second 
cosmid library was prepared by cloning partially 
digested (Mbo I) human lymphoblastoid DNA into the 
20 vector pWE-IL2R, prepared by inserting the RSV (Rous 
Sarcoma Virus) promoter-driven cDNA for the 
interleukin-2 receptor a-chain (supplied by M. Fordis 
and B. Howard) in place of the neo-resistance gene of 
PWE15 [Wahl et al Proc. Natl. Acad, sni . Wh 8 4:2160 
25 (1987)]. An additional partial Mbo I cosmid library was 
prepared in the vector pWE-IL2-Sal, created by inserting 
a Sal I linker into the Bam HI cloning site of pWE-EL2R 
(M. Drunm, unpublished data) ; this allows the use of the 
partial fill-in technique to ligate Sal I and Mbo I 
30 ends, preventing tandem insertions [Zabarovsky. et al 

Sena. 42:19 (1986)]. Cosmid libraries were propagated in 

SSli host strains DH1 or 490A [M. Steinmetz, A. 
Winoto, K. Minard, L. Hood, Cell 28, 489(1982)]. 
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TABLE 1 

GBOtrg T TT * ?a i p TFF ! 
SPVJW 9f tMWn CftA Hast CCppleaH^ y 



5 A Oiaron Haell/AluI-partiaUy LE392 1 x 10 6 
4A dioested tafcai (amplified) 



10 



PCV108 



Adash 



15 Adash 



digested total human 
liver ENA 



Sau3a-partially digested EKL 3 x 10 s 

ENA fran 4AF/R315 (amplified) 

Sau3A-partially digested LE392 l x 10 6 
ENA from 4AF/K315 (amplified) 



20 



Adash 



Adash 



25 



. Sau3A-partially digested DB1316 
total human peripheral 
blood ENA 

BamHI-digested total E81316 
human peripheral blood 
ENA 

EooRI-partially digested DB1316 
total "human peripheral 
blood ENA 

Mbol-partially digested LE392 
human lymphoblastoid ENA 

Mbol-partially digested CE200 
human lymphoblastoid ENA 

Mbol-partially digested TAP90 
human lymphnhlastoid ENA . 

PWE-H2R Mbol-partially digested 490A 
human lymphoblastoid ENA 

PWE-H2R- Mbol-partially digested 490A 
40 Sal human lymphoblastoid ENA 

ACh3A BooRr-partially digested MC1061 

Alac (24-110 kb) 

(jumping) human lymphoblastoid ENA 

45 



A FIX 



30 A FIX 



AFDC 



35 



1.5 X 10 6 
1.5 X 10 6 
8 X 10 6 

1.5 X 10 6 

1.2 x 10 6 

1.3 X 10 6 
5 X 10 5 
1.2 x 10 6 
3 x 10 6 



-fief 

Lawn 
et al 

1980 



Oollins 
et al 
supra and 
Iannuzzi 
et al 
supra 
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Three of the phage libraries were propagated and 
amplified in fi^ ssli. bacterial strain LE392. Four 
subsequent libraries were plated on the recombination- 
deficient hosts DB1316 (recD") or C£S200 (rec BC") 
5 [Wyman 1985, sjjpjca; Wertman 1986, supra » and Wyman 1986 
or in one case TAP90 [T.A. Patterson and M. Dean 
NUCleic Acids ResMr^ 15, 6298 (1987) ]. 

Single copy DNA segments (free of repetitive 
elements) near the ends of each phage or cosmid insert 
10 were purified and used as probes for library screening 
to isolate overlapping DNA fragments by standard 
procedures. (Maniatis, et al, supra ) . 

1-2 x lo 6 phage clones were plated on 25-30 150 mm 
petri dishes with the appropriate indicator bacterial 
15 host and incubated at 37 'C for 10-16 hr. Duplicate 
"lifts" were prepared for each plate with 
nitrocellulose or nylon membranes, prehybridized and 
hybridized under conditions described [Rommens et al, 
1988, sjffita]. probes were labelled with 32 p to a 
20 specific activity of >5 x 10* cpm/„g using the random 
priming procedure [A. P. Feinberg and B. Vogelstein, 
Anal. Bipcheffi . 132, 6 (1983)]. The cosmid library was 
spread on ampicill in-containing plates and screened in a 
similar manner. 
25 DNA probes which gave high background signals 

could often be used more successfully by preannealing 
the boiled probe with 250 M g/ml sheared denatured 
placental DNA for 60 minutes prior to adding the probe 
ta the hybridization bag. 
10 For each walk step, the identity of the cloned DNA 

fragment was determined by hybridization with a somatic 
cell hybrid panel to confirm its chromosomal location, 
and by restriction mapping and Southern blot analysis to 
confirm its colinearity with the genome. 
5 The total combined cloned region of the genomic DNA 

sequences isolated and the overlapping cDNA clones, 
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extended >500 kb. To ensure that the DNA segments 
isolated by the chromosome walking and jumping 
procedures were colinear with the genomic sequence; 
each segment was examined by: 
5 (a) hybridization analysis with human-rodent 

somatic hybrid cell lines to confirm chromosome 7 
localization, 

(b) pulsed field gel electrophoresis, and 

(c) comparison of the restriction map of the cloned 
10 DNA to that of the genomic DNA. 

Accordingly, single copy human DNA sequences were 
isolated from each recombinant phage and cosmid clone 
and used as probes in each of these hybridization 
analyses as performed by the procedure of Maniatis, et 

15 al supra. 

While the majority of phage and cosmid isolates 
represented correct walk and jump clones, a few resulted 
from cloning artifacts or cross-hybridizing sequences 
from other regions in the human genome , or from the 

20 hamster genome in cases where the libraries were derived 
from a human-hamster hybrid cell line. Confirmation of 
correct localization was particularly important for 
clones isolated by chromosome jumping. Many jump clones 
were considered and resulted in non-conclusive 

25 information leading the direction of investigation away 
from the gene. 

2.3 ?OWyTPMATTOW OF THE RESTRXCTION MAP 

Further confirmation of the overall physical map of 
the overlapping clones was obtained by long range 
30 restriction mapping analysis with the use of pulsed 

field gel electrophoresis (J. M. Rommens, et al. ftm T *7t, 
Hum. Genet, in press, A. M. Poustka et al, 1988, gu?ra 
M.L. Drumm et al, 1988 SSJETa) • 

Figures 3 A to 3E illustrates the findings of the 
35 long range restriction mapping study, where a schematic 
representation of the region is given in Panel E. DNA 
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from the human-hamster cell line 4AF/102/K015 was 
digested with the enzymes (A) Sal I, (B) Xho I, (C) Sfi 
I and (D) Nae I, separated by pulsed field gel 
electrophoresis, and transferred to Zetaprobe* (BioRad) . 
5 For each enzyme a single blot was sequentially 

hybridized with the probes indicated below each of the 
panels of Figure A to D, with stripping of the blot 
between hybridizations. The symbols for each enzyme of 
Figure 3E are: A, Nae I; B, Bss HIT; F. Sfi I; L, Sal I; 

10 M, Mlu I; N, Not I; R, Nru I; and X, Xho 1. C 

corresponds to the compression zone region of the gel. 
ONA preparations, restriction digestion, and crossed 
field gel electrophoresis methods have been described 
(Rommens et al, in press, supra ) . The gels in Figure 3 

IS were run in 0.5X TBE at 7 volts/ cm for 20 hours with 
switching linearly ramped from 10-40 seconds for (A) , 
(B) , and (C) , and at 8 volts/ cm for 20 hours with 
switching ramped linearly from 50-150 seconds for (D) . 
Schematic interpretations of the hybridization pattern 

20 are given below each panel. Fragment lengths are in 

kilobases and were sized by comparison to oligomerized 
bacteriophage ADNA and Saccharomvces cerevisiag 
chromosomes. 

H4.0, J44, EG1.4 are genomic probes generated from 
25 the walking and jumping experiments (see Figure 2) . 
J30 has been isolated by four consecutive jumps from 
D7S8 (Collins et al, 1987, sum; lanuzzi et al, 1989, 
supra : M. Dean, et al, submitted for publication). 10- 
l r B.75, and CE1. 5/1.0 are cDNA probes which cover 
30 different regions of the CF transcript: 10-1 contains 
exons I - VI, B.75 contains exons V - XII, and CE1. 5/1.0 
contains exons XII - XXIV. Shown in Figure 3E is a 
composite map of the entire MET - D7S8 interval. The 
boxed region indicates the segment cloned by walking and 
35 jumping, and the slashed portion indicates the region 
covered by the CF transcript. The CpG-rich region 
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associated with the D7S23 locus (Estivill et al, 1987 
supra) is at the Not I site shown in parentheses. This 
and other sites shown in parentheses or square brackets 
do not cut in 4AF/102/K015, but haVe been observed in 
5 human lymphoblast cell lines. 
2*A I DENTIFICATION OP rv 

Based on the findings of long range restriction 
mapping detailed above it was determined that the entire 
CF gene is contained on a 380 Jcb Sal I fragment. 
Alignment of the restriction sites derived from pulsed 
field gel analysis to those identified in the partially 
overlapping genomic DNA clones revealed that the size of 
the CF gene was approximately 250 kb. 

The most informative restriction enzyme that served 
to align the map of the cloned DNA fragments and the 
long range restriction map was Xho I; all of the 9 Xho 1 
sites identified with the recombinant DNA clones 
appeared to be susceptible to at least partial cleavage 
in genomic DNA (compare maps in Figures 1 and 2) . 
Furthermore, hybridization analysis with probes derived 
from the 3' end of the CF gene identified 2 Sfii sites 
and confirmed the position of an anticipated Nae I site. 

These findings further supported the conclusion 
that the DNA segments isolated by the chromosome walking 
and jumping procedures were colinear with the genuine 
sequence. 

2*5 CRITERIA FOR TDRNTTfTrATjnH 

A positive result based on one or more of the 
following criteria suggested that a cloned DNA segment 
30 may contain candidate gene sequences: 

(a) detection of cross-hybridizing sequences in 
other species (as many genes show evolutionary 
conservation) , 

(b) identification of CpG islands, which often mark 
35 the 5' end of vertebrate genes [A. P. Bird, Nj&uig, 321, 
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209 (1986) ; M. Gardiner-Garden and M. Frommer, J. Mol. 
fiifil,. 196, 261 (1987)], 

(c) examination of possible mRNA transcripts in 
tissues affected in CF patients, 
5 (d) isolation of corresponding cDNA sequences, 

(e) identification of open reading frames by direct -— 
sequencing of cloned DNA segments. 

Cross-species hybridization showed strong sequence 
conservation between human and bovine DNA when CF14, 
10 E4.3 and HI. 6 were used as probes, the results of which 
are shown in Figures 4A, 4B and 4C. 

Human, bovine, mouse, hamster, and chicken genomic 
DNAs were digested with Eco RI (R) , Hind III (H) , and 
Pst I (P), electrophoresed, and blotted to Zetabind" 
15 (BioRad) . The hybridization procedures of Rommens et 
al, 1988, supra, were used with the most stringent wash 
at 55'C, 0.2X SSC, and 0.1% SDS. The probes used for 
hybridization, in Figure 4, included: (A) entire 
cosmid CF14, (B) E4.3, (C) HI. 6. in the schematic of 
20 Figure (D) , the shaded region indicates the area of 
cross-species conservation. 

The fact that different subsets of bands were 
detected in bovine DNA with these two overlapping DNA 
segments (HI. 6 and E4.3) suggested that the conserved 
25 sequences were located at the boundaries of ther 
overlapped region (Figure 4(D)). when these. DNA 
segments were used to detect RNA transcripts from a 
variety of tissues, no hybridization signal was 
detected. In an attempt to understand the cross- 
30 hybridizing region and to identify possible open reading 
frames, the DNA sequences of the entire HI. 6 and part of 
the E4.3 fragment were determined. The results showed 
that, except for a long stretch of CG-rich sequence 
containing the recognition sites for two restriction 
35 enzymes (Bss HII and Sac II), often found associated 

with undermethylated CpG islands, there were only short 
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open reading frames which could not easily explain the 
strong cross-species hybridization signals. 

To examine the methylation status of this highly 
CpG-rich region revealed by sequencing, genomic DNA 
5 samples prepared from fibroblasts and lymphoblasts were 
digested with the restriction enzymes Hpa II and Msp I 
and analyzed by gel blot hybridization. The enzyme Hpa 
II cuts the DNA sequence 5'-CCGG-3' only when the 
second cytosine is unmethylated, whereas Msp I cuts this 

10 sequence regardless of the state of methylation. Small 
DNA fragments were generated by both enzymes, indicating 
that this CpG-rich region is indeed undermethylated in 
genomic DNA. The gel-blot hybridization with the E4.3 
segment (Figure 6) reveals very small hybridizing 

15 fragments with both enzymes, indicating the presence of 
a hypomethylated CpG island. 

The above "results strongly suggest the presence of 
a coding region at this locus. Two DNA segments (E4.3 
and Hi. 6) which detected cross-species hybridization 

20 signals from this area were used as probes to screen 

cDNA libraries made from several tissues and cell types. 

cDNA libraries from cultured epithelial cells were 
prepared as follows. Sweat gland cells derived from a 
non-CF individual and from a CF patient were grown to 

25 first passage as described [G. Collie et al, In Vitro 
C ft ], ;i t ncv. Biol . 21, 592,1985]. The presence of 
outwardly rectifying channels was confirmed in these 
cells (J. A. Tabcharani, T.J. Jensen, J.R. Riordan, J.W. 
Hanrahan, .t. Memb. Biol. , in press) but the CF cells 

30 were insensitive to activation by cyclic AMP (T.J. 

Jensen, J.W. Hanrahan, J. A. Tabcharani, M. Buchwald and 
J.R. Riordan, Pediatric Pulmonoloov. supplement 2, 100, 
1988). RNA was isolated from them by the method of J.M. 
Chirgwin et al /Biochemistry 18, 5294, 1979). Poly 

35 A+RNA was selected (H. Aviv and P. Leder, Proc t Natlt 

y?ad. Sci. USA 69, 1408, 1972) and used as template for 
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the synthesis of cDNA with oligo (dT) 12-18 as a primer. 
The second strand was synthesized according to Gubler 
and Hoffman ( Gene 25, 263, 1983). .This was methylated 
with Eco RI methylase and ends were made flush with T4 
5 DNA polymerase. Phosphorylated Eco RI linkers were 
ligated to the cDNA and restricted with Eco RI. 
Removal of excess linkers and partial size 
fractionation was achieved by Biogel A- 50 
chromatography. The cDNAs were then ligated into the 

10 Eco RI site of the commercialy available lamdba ZAP. 
Recombinants were packaged and propagated in E. coli 
BB4. Portions of the packaging mixes were amplified and 
the remainder retained for screening prior to 
amplification. The same procedures were used to 

15 construct a library from RNA isolated from preconfluent 
cultures of the T-84 colonic carcinoma cell line 
(Dharmsathaphora, K. et al. Am. J. Phvsiol . 246, 
G204,1984). The numbers of independent recombinants in 
the three libraries were: 2 x 10 6 for the non-CF sweat 

20 gland cells, 4.5 x 10 6 for the CF sweat gland cells and 

50,000 per 15 cm plate and plaque lifts made using nylon 
membranes (Biodyne) and probed with DNA fragments 
labelled with 32 P using DNA polymerase I and a random 

25 mixture of oligonucleotides as primer. Hybridization 
conditions were according to G.M. Wahl and S.L. Berger 
fMeth. Enzvmol. 152,415, 1987). Bluescript 11 plasmids 
were rescued from plague purified clones by excision 
with M13 helper phage. The lung and pancreas libraries 

30 were purchased from Clontech Lab Inc. with reported 
sizes of 1.4 x 10 6 and 1.7 x 10 6 independent clones. 

After screening 7 different libraries each 
containing 1 x 10 5 - 5 x 10 6 independent clones, 1 
single clone (identified as 10-1) was isolated with HI. 6 

35 from a cDNA library made from the cultured sweat gland 
epithelial cells of an unaffected (non-CF) individual. 
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DNA sequencing analysis shoved that 10-1 contained 
an insert of 920 bp in size and one potential, long open 
reading frame (ORF) . Since one end of the sequence 
shared perfect sequence identity with HI. 6/ it was 
5 concluded that the cDNA clone was probably derived from 
this region. The DNA sequence in common was, however, 
only 113 bp long (see Figures 1 and 7) • As detailed 
below, this sequence in fact corresponded to the 5' -most 
exon of the putative CF gene. The short sequence 

10 overlap thus explained the weak hybridization signals in 
library screening and inability to detect transcripts in 
RNA gel -blot analysis. In addition r the orientation of 
the transcription unit was tentatively established on 
the basis of alignment of the genomic DNA sequence with 

15 the presumptive ORF of 10-1. 

Since the corresponding transcript was estimated to 
be approximately 6500 nucleotides in length by RNA gel- 
blot hybridization experiments, further cDNA library 
screening was required in order to clone the remainder 

20 of the coding region. As a result of several 

successive screenings with cDNA libraries generated from 
the colonic carcinoma cell line T84, normal and CF sweat 
gland cells, pancreas and adult lungs, 18 additional 
clones were isolated (Figure 7, as subsequently 

25 discussed in greater detail) • DNA sequence analysis 

revealed that none of these cDNA clones corresponded to 
the length of the observed transcript, but it was 
possible to derive a consensus sequence based on 
overlapping regions. Additional cDNA clones 

30 corresponding to the 5' and 3' ends of the transcript 
were derived from 5' and 3' primer-extension 
experiments. Together, these clones span a total of 
about 6.1 kb and contain an ORF capable of encoding a 
polypeptide of 1480 amino acid residues (Figure 1) . 

35 It was unusual to observe that most of the cDNA 

clones isolated here contained sequence insertions at 
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various locations of the restriction nap of Figure 7. 
The map details the genomic structure of the CF gene. 
Exon/intron boundaries are given where all cDNA clones 
isolated are schematically represented on the upper half 
5 of the figure. Many of these extra sequences clearly 
corresponded to intron regions reversely transcribed 
during the construction of the cDNA, as revealed upon 
alignment with genomic DNA sequences. 

Since the number of recombinant cDNA clones for the 
10 CF gene detected in the library screening was much less 
than would have been expected from the abundance of 
transcript estimated from RNA hybridization experiments , 
it seemed probable that the clones that contained 
aberrant structures were preferentially retained while 
15 the proper clones were lost during propagation. 

consistent with this interpretation, poor growth was 
observed for the majority of the recombinant clones 
isolated in this study, regardless of the vector used. 
The procedures used to obtain the 5' and 3' ends of 
20 the cDNA were similar to those described (M. Frohman et 
al, Proc. Nat. Acad. Set. USA, 85, 8998-9002, 1988). 
For the 5' end clones, total pancreas and T84 poly A + 
RNA samples were reverse transcribed using a primer, 
(10b) , which is specific to exon 2 similarly as has been 
25 described for the primer extension reaction except that 
radioactive tracer was included in the reaction. The 
fractions collected from an agarose bead column of the 
first strand synthesis were assayed by polymerase chain 
reaction (PCR) of eluted fractions. The 
30 oligonucleotides used were within the 10-1 sequence (145 
nucleotides apart) just 5' of the extension primer. The 
earliest fractions yielding PGR product were pooled and 
concentrated by evaporation and subsequently tailed with 
terminal deoxynucleotidyl transferase (BRL Labs.) and 
35 dATP as recommended by the supplier (BRL Labs) . A 

second strand synthesis was then carried out with Taq 
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Polymerase (Cetus, AmpliTaq*) using an oligonucleotide 
containing a tailed linker sequence 
5 ' CGGAATTCTCGAGATC (T) 12 3 ' . 

Amplification by an anchored' ?PCR) experiment 
5 using the linker sequence and a primer just internal to 
the extension primer which possessed the Eco Rl 
restriction site at its 5' end was then carried out. 
Following restriction with the enzymes Eco Rl and Bgl II 
and agarose gel purification size selected products were 
cloned into the plasmid Bluescript KS available from 
Stratagene by standard procedures (Maniatis et al, 
siiBEa) . Essentially all of the recovered clones 
contained inserts of less than 350 nucleotides. To 
obtain the 3' end clones, first strand cDNA was prepared 
with reverse transcription of 2 ag T84 poly A + RNA 
using the tailed linker oligonucleotide previously 
described with conditions similar to those of the primer 
extension. Amplification by pcr was then carried out 
with the linker oligonucleotide and three different 
oligonucleotides corresponding to known sequences of 
clone T16-4.5. A preparative scale reaction (2 x 100 
ul) was carried out with one of these oligonucleotides 
with the sequence 5 ' ATGAAGTCCAAGGATTTAG3 ' . * 

This oligonucleotide is approximately 70 
nucleotides upstream of a Hind III site within the known 
sequence of T16-4.5. Restriction of the PCR- product 
with Hind III and Xho 1 was followed by agarose gel 
purification to size select a band at 1.0-1.4 kb. This 
product was then cloned into the plasmid Bluescript KS 
available from Stratagene. Approximately 20% of the 
obtained clones hybridized to the 3 ' end portion of T16- 
4.5. 10/10 of plasmids isolated from these clones had 
identical restriction maps with insert sizes of approx. 
1.2 kb. All of the PCR reactions were carried out for 
30 cycles in buffer suggested by an enzyme supplier. 
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An extension primer positioned 157 nt from the 
5'end of 10-1 clone was used to identify the start point 
of the putative cf transcript. The priaer was end 
labeled with 7 [ 32 P]ATP at 5000 Curies/aaole and T4 
5 polynucleotide kinase and purified by spun coluan gel 

filtration. The radiolabeled priier was then annealed ... 
with 4-5 ug poly A + rna prepared froa T-84 colonic 
carcinoma cells in 2X reverse transcriptase buffer for 2 
hrs. at 60 c. Following dilution and addition of AMV - 
10 reverse transcriptase (Life Sciences, inc., incubation 
at 41 c proceeded for 1 hour. The saaple was then 
adjusted to 0.4M NaOH and 20 aN EDTA, and finally 
neutralized, with NH 4 OAc, pH 4.6, phenol extracted, 
ethanol precipitated, redissolved in buffer with 
15 foraaaide, and analyzed on a polyacrylaaide sequencing 
gel. Details of these aethods have been described 
( Iffttlt BnmmoL -152, 1987, Ed . S.L. Berger, A.R. Kiaael, 
Acadeaic Press, N.Y.). 

20 * R ! SUltS ° f ^ Priner extension experiaent using an 
20 extension oligonucleotide priaer starting 157 

nucleotides froa the 5' end of lo-l is shown in Panel A 

of Figure 10. End labeled ,X174 bacteriophage digested 

with Hae III (BRL Labs, is used as size aarker. Two 

aa 3 or products are observed at 216 and 100 nucleotides. 

25 The sequence corresponding to 100 nucleotides in 10-1 

corresponds to a very GC rich sequence (11/13, 

suggesting that this could be a reverse transcriptase 

pause site. The 5' anchored PGR results are shown in 

30 IT^ I ° f FlgUrC 10 ' l ' 4 * agaro " I* 1 sho ™ on 

30 the left was blotted and transferred to Zetaprobe* 

meabrane (Bio-Rad Lab, . DNA gel blot hybridization with 

radiolabeled 10-1 is shown on the right. The 5' 

extension products are seen to vary in size froa 170-280 

nt with the aajor product at about 200 nucleotides. The 

35 PGR control lane shows a fragaent of 145 nucleotides. 

It was obtained by using the test oligomers within the 



WO 91/02796 



34 



PCI7CA90/00267 



10-1 sequence. The size markers shown correspond to 
sizes of 154, 220/210, 298, 344, 394 nucleotides (lkb 
ladder purchased from BRL Lab) . 

The schematic shown below Panel B of Figure 10 
5 outlines the procedure to obtain double stranded cDNA 
used for the amplification and cloning to generate the 
clones PA3-5 and TB2-7 shown in Figure 7. The anchored 
PCR experiments to characterize the 3 'end are shown in 
panel C. As depicted in the schematic below Figure 10C, 
10 three primers whose relative position to each other were 
known were used for amplification with reversed 
transcribed T84 RNA as described. These products were 
separated on a 1% agarose gel and blotted onto nylon 
membrane as described above. DNA-blot hybridization 

15 with the 3' portion of the T16-4.5 clone yielded bands 
of sizes that corresponded to the distance between the 
specific oligomer used and the 3 'end of the transcript. 
These bands in lanes 1, 2a and 3 are shown schematically 
below Panel C in Figure 10. The band in lane 3 is weak 

20 as only 60 nucleotides of this segment overlaps with the 
probe used. Also indicated in the schematic and as 
shown in the lane 2b is the product generated by 
restriction of the anchored PCR product to facilitate 
cloning to generate the THZ-4 clone shown in Figure 7. 

25 DNA-blot hybridization analysis of genomic DNA 

digested with EcoRI and Hindlll enzymes probed with 
portions of cDNAs spanning the entire transcript suggest 
tkat the gene contains at least 24 exons numbered as 
Reman numerals I through XXIV (see Figure 9) • These 

30 correspond to the numbers 1 through 24 shown in Figure 
7. The size of each band is given in kb. 

In Figure 7, open boxes indicate approximate 
positions of the 24 exons which have been identified by 
the isolation of >22 clones from the screening of cDNA 

35 libraries and from anchored PCR experiments designed to 
clone the 5' and 3 ' ends. The lengths in kb of the Eco 
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RZ genomic fragments detected by each exon is also 
indicated. The hatched boxes in Figure 7 indicate the 
presence of intron sequences and the stippled boxes 
indicate other sequences. Depicted in the lover left by 
5 the closed box is the relative position of the clone 
Hi. 6 used to detect the first cDNA clone 10-1 from among 
10 6 phage of the normal sweat gland library. As shown in 
Figures 4(D) and 7, the genomic clone HI. 6 partially 
overlaps with an EcoRI fragment of 4.3 kb. All of the 
10 cDNA clones shown were hybridized to genomic DNA and/or 
were fine restriction mapped. Examples of the 
restriction sites occurring within the cDNAs and in the 
corresponding genomic fragments are indicated. 

With reference to Figure 9, the hybridization 
15 analysis includes probes; i.e., cDNA clones 10-1 for 
panel A, T16-1 (3' portion) for panel B, T16-4.5 
(central portion) for panel c and T16-4.5 (3' end 
portion) for panel D. In panel A of Figure 9, the cDNA 
probe 10-1 detects the genomic bands for exons I through 
20 VI. The 3' portion of T16-1 generated by Nrul 

restriction detects exons IV through XIII as shown in 
Panel B. This probe partially overlaps with 10-1. 
Panels C and D, respectively, show genomic bands 
.detected by the central and 3' end EcoRI fragments of 
25 the clone T16-4.5. Two EcoRI sites occur within the 
cDNA sequence and split exons XIII and XIX. : As 
indicated by the exons in parentheses, two genomic EcoRI 
bends correspond to each of these exons. Cross 
hybridization to other genomic fragments was observed. 
30 These bands, indicated by N, are not of chromosome 7 
origin as they did not appear in human-hamster hybrids 
containing human chromosome 7. The faint band in panel 
D indicated by XI in brackets is believed to be caused 
by the cross-hybridization of sequences due to internal 
35 homology with the cDNA. 
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Since 10-1 detected a strong band on gel blot 
hybridization of RNA from the T-84 colonic carcinoma 
cell line, this cDNA was used to screen the library 
constructed from that source. Fifteen positives were 
5 obtained from which clones T6, T6/20, Til, T16-1 and 
T13-1 were purified and sequenced. Rescreening of the 
same library with a 0.75 kb Bam HI-Eco RI fragment from 
the 3' end of T16-1 yielded T16-4.5. A 1.8kb EcoRI 
fragment from the 3 end of T16-4.5 yielded T8-B3 and 

10 T12a, the latter of which contained a polyadenylation 
signal and tail. Simultaneously a human lung cDNA 
library was screened; many clones were isolated 
including those shown here with the prefix % CDL'. A 
pancreas library was also screened, yielding clone 

15 CDPJ5. 

To obtain copies of this transcript from a CF 
patient, a cDNA library from RNA of sweat gland 
epithelial cells from a patient was screened with the 
0.75 kb Bam HI - Eco RI fragment from the 3' end of T16- 

20 1 and clones C16-1 and Cl-1/5, which covered all but 

exon 1, were isolated. These two clones both exhibit a 
3 bp deletion in exon 10 which is not present in any 
other clone containing that exon. Several clones, 
including CDLS26-1 from the lung library and T6/20 and 

25 T13-1 isolated from T84 were derived from partially 
processed transcripts. This was confirmed by genomic 
hybridization and by sequencing across the exon-intron 
boundaries for each clone. Til also contained 
additional sequence at each end. T16-4.5 contained a 

30 small insertion near the boundary between exons 10 and 
11 that did not correspond to intron sequence. Clones 
CDLS16A, 11a and 13a from the lung library also 
contained extraneous sequences of unknown origin. The 
clone C16-1 also contained a short insertion 

35 corresponding to a portion of the 7-transposon of £. 

coli ? this element was not detected in the other clones. 
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The 5' clones PA3-5, generated from pancreas RNA and 
TB2-7 generated from T84 RNA using the anchored PCR 
technique have identical sequences except for a single 
nucleotide difference in length at" tine 5' end as shown 
5 in Figure 1. The 3' clone, THZ-4 obtained from T84 RNA 
contains the 3' sequence of the transcript in 
concordance with the genomic sequence of this region. 

A combined sequence representing the presumptive 
coding region of the CF gene was generated from 
10 overlapping cDNA clones, since most of the cDNA clones 
were apparently derived from unprocessed transcripts, 
further studies were performed to ensure the 
authenticity of the combined sequence. Each cDNA clone 
was first tested for localization to chromosome 7 by 
15 hybridization analysis with a human-hamster somatic 

cell hybrid containing a single human chromosome 7 and 
by pulsed field gel electrophoresis. Fine restriction 
enzyme mapping was also performed for each clone. While 
overlapping regions were clearly identifiable for most 
20 of the clones, many contained regions of unique 
restriction patterns. 

To further characterize these cDNA clones, they 
were used as probes in gel hybridization experiments 
with EcoRI-or Hindlll-digested human genomic DNA. As 
25 shown in Figure 9, five to six different restriction 
fragments could be detected with the 10-1 cQNA and a 
similar number of fragments with other cDNA clones, 
suggesting the presence of multiple exons for the 
putative CF gene. The hybridization studies also 
30 identified those cDNA clones with unprocessed intron 

-sepuence* -a* ±hjy -*hnw^ ^ra&roittta' dyxiratftfcaTiibn to a 
subset of genomic DNA fragments. For the confirmed cDNA 
clones, their corresponding genomic DNA segments were 
isolated and the exons and exon/intron boundaries 
35 sequenced. As indicated in Figure 7, a total of 24 

exons were identified. Based on this information and 
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the results of physical napping experiments, the gene 
locus was estimated to span 250 kb on chromosome 7. 
2*6 THE SEQUENCE 

Figure 1 shows the nucleotide sequence of the 
5 cloned cDNA encoding CFTR together with the deduced 
amino acid sequence. The first base position 
corresponds to the first nucleotide in the 5' extension 
clone PA3-5 which is one nucleotide longer than TB2-7. 
Arrows indicate position of transcription initiation 
10 site by primer extension analysis. Nucleotide 6129 is 
followed by a poly(dA) tract. Positions of exon 
junctions are indicated by vertical lines. Potential 
membrane-spanning segments were ascertained using the 
algorithm of Eisenberg et al J. Mol, Biol. 179:125 

15 (1984) • Potential membrane-spanning segments as analyzed 
and shown in Figure 11 are enclosed in boxes of Figure 
1. In Figure 11 , the mean hydropathy index [Kyte and 
Doolittle, J. Molec. Biol . 157: 105, (1982)] of 9 
residue peptides is plotted against the amino acid 

20 number. The corresponding positions of features of 
secondary structure predicted according to Gamier et 
al, r j. Molec. Biol. 157, 165 (1982)] are indicated in 
the lower panel. Amino acids comprising putative ATP- 
binding folds are underlined in Figure 1. Possible 

25 sites of phosphorylation by protein kinases A (PKA) or C 
(PKC) are indicated by open and closed circles, 
respectively. The open triangle is over the 3 bp (CTT) 
which are deleted in CF (see discussion below) • The 
cDNA clones in Figure 1 were sequenced by the dideoxy 

30 chain termination method employing 35 S labelled 

nucleotides by the Dupont Genesis 2000* automatic ONA 
sequencer. 

The combined cDNA sequence spams 6129 base pairs 
excluding the poly (A) tail at the end of the 3' 
35 untranslated region and it contains an ORF capable of 
encoding a polypeptide of 1480 amino acids (Figure 1) . 
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An ATG (AUG) triplet is present at the beginning of this 
ORF (base position 133-135). since the nucleotide 
sequence surrounding this codon (5/-AGACCAUGCA-3 ' ) has 
the proposed features of the consensus sequence (CC) 
5 A/GCCAJZ£G(G) of an eukaryotic translation initiation 

site with a highly conserved A at the -3 position, it is 
highly probable that this AUG corresponds to the first 
methionine codon for the putative polypeptide. 

TO obtain the sequence corresponding to the 5' end 
10 of the transcript, a primer-extension experiment was 
performed, as described earlier. As shown in Figure 
10A, a primer extension product of approximately 216 
nucleotides could be observed suggesting that the 5' 
end of the transcript initiated approximately 60 
15 nucleotides upstream of the end of cDNA clone lo-l. A 
modified polymerase chain reaction (anchored PCR) was 
then used to facilitate cloning of the 5 'end sequences 
(Figure 10b) . Two independent 5' extension clones, one 
from pancreas and the other from T84 RNA, were 
20 characterized by DNA sequencing and were found to differ 
by only 1 base in length, indicating the most probable 
initiation site for the transcript as shown in Figure l. 

Since most of the initial cDNA clones did not 
contain a polyA tail indicative of the end of a mRNA, 
25 anchored PCR was also applied to the 3' end of the 
transcript (Frohman et al, 1988, supra ) . Three 3'- 
extension oligonucleotides were made to the terminal 
portion of the cDNA clone T16-4.5. As shown in Figure 
10c, 3 PCR products of different sizes were obtained. 
30 All were consistent with the interpretation that the end 
of the transcript was approximately 1.2 kb downstream 
of the Hindlll site at nucleotide position 5027 (see 
Figure 1) . The DNA sequence derived from representative 
clones was in agreement with that of the T84 cDNA clone 
35 T12a (see Figure 1 and 7) and the sequence of the 
corresponding 2.3 Jcb EcoRI genomic fragment. 
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3.0 HQT.gCTTTAR GENETICS OP CF 

3.1 STT^<5 OP EXPRESSION 

To visualize the transcript for the putative CF 
gene, RNA gel blot hybridization experiments were 
5 performed with the 10-1 cDNA as probe. The RNA 
hybridization results are shown in Figure 8. 

RNA samples were prepared from tissue samples 
obtained from surgical pathology or at autopsy according 
to methods previously described (A.M. Kimmel, S.L. 

10 Berger, eds. Meth. Enzvmol. 152, 1987). Formaldehyde 
gels were transferred onto nylon membranes (Zetaprobe 
™; BioRad Lab) . The membranes were then hybridized 
with DNA probes labeled to high specific activity by the 
random priming method (A. P. Feinberg and B. Vogelstein, 

15 Anal. Biochem. 132, 6, 1983) according to previously 
published procedures (J. Rommens et al, Am? Ji fflflBi 
Genet. 43, 645-663, 1988). Figure 8 shows hybridization 
by the cDNA clone 10-1 to a 6.5kb transcript in the 
tissues indicated. Total RNA (10 pg) of each tissue, 

20 and Poly A+ RNA (1 /ig) of the T84 colonic carcinoma cell 
line were separated on a 1% formaldehyde gel. The 
positions of the 28S and 18S rRNA bands are indicated. 
Arrows indicate the position of transcripts. Sizing was 
established by comparison to standard RNA markers (BRL 

25 Labs) . HL60 is a human promyelocytic leukemia cell line, 
and T84 is a human colon cancer cell line. 

Analysis reveals a prominent band of approximately 
6*5 kb in size in T84 cells. Similar, strong 
hybridization signals were also detected in pancreas and 

30 primary cultures of cells from nasal polyps, suggesting 
that the mature mRNA of the putative CF gene is 
approximately 6.5 kb. Minor hybridization signals, 
probably representing degradation products, were 
detected at the lower size ranges but they varied 

35 between different experiments. Identical results were 
obtained with other cDNA clones as probes. Based on the 
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hybridization band intensity and comparison with those 
detected for other transcripts under identical 
experimental conditions, it was estimated that the 
putative CF transcripts constituted " approximately 0.01% 
5 of total mRNA in T84 cells. 

A number of other tissues were also surveyed by RNA 
gel blot hybridization analysis in an attempt to 
correlate the expression pattern of the 10-1 gene and 
the pathology of CF. As shown in Figure 8, transcripts., 
10 all of identical size, were found in lung, colon, sweat 
glands (cultured epithelial cells) , placenta, liver, and 
parotid gland but the signal intensities in these 
tissues varied among different preparations and were 
generally weaker than that detected in the pancreas and 
15 nasal polyps. Intensity varied among different 

preparations, for example, hybridization in kidney was 
not detected in the preparation shown in Figure 8, but 
can be discerned in subsequent repeated assays. No 
hybridization signals could be discerned in the brain or 
20 adrenal gland (Figure 8), nor in skin fibroblast and 
lymphoblast cell lines. 

In summary, expression of the CF gene appeared to 
occur in many of the -tissues examined, with higher 
levels in those tissues severely affected in CF. While 
25 this epithelial tissue-specific expression pattern is in 
good agreement with the disease pathology, no 
significant difference has been detected in the amount 
or size of transcripts from CF and control tissues, 
consistent with the assumption that CF mutations are 
30 subtle changes at the nucleotide level. 
1*2 THE MAJOR CF MOTATTOW 

Figure 17 shows the DNA sequence at the F508 
deletion. On the left, the reverse complement of the 
sequence from base position 1649-1664 of the normal 
35 sequence (as derived from the cDNA clone T16) . The 
nucleotide sequence is displayed as the output (in 
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arbitrary fluorescence intensity units, y-axis) plotted 
against time (x-axis) for each of the 2 photomultiplier 
tubes (PHT#1 and #2) of a Dupont Genesis 2000™ DNA 
analysis system. The corresponding" nucleotide sequence 
5 is shown underneath. On the right is the same region 
from a mutant sequence (as derived from the cDNA clone 
CI 6) . Double-stranded plasmid DNA templates were 
prepared by the alkaline lysis procedure. Five pg of 
plasmid DNA and 75 ng of oligonucleotide primer were 
10 used in each sequencing reaction according to the 

protocol recommended by Dupont except that the annealing 
was done at 45 *C for 30 min and that the 
elongation/termination step was for 10 min at 42 *C. The 
unincorporated fluorescent nucleotides were removed by 
15 precipitation of the DNA sequencing reaction product 
with ethanol in the presence of 2.5 M ammonium acetate 
at pH 7.0 and rinsed one time with 70% ethanol. The 
primer used for the T16-1 sequencing was a specific 
oligonucleotide 5 'GTTGGCATGCTTTGATGACGCTTC3 9 spanning 
20 base position 1708 - 1731 and that for C16-1 was the 
universal primer SK for the Bluescript vector 
(Stratagene) . Figure 18 also shows the DNA sequence 
around the F508 deletion, as determined by manual 
sequencing. The normal sequence from base position 
25 1726-1651 (from cDNA T16-1) is shown beside the CF 

sequence (from cDNA C16-1) . The left panel shows the 
sequences from the coding strands obtained with the B 
primer (S'GTTTTCCTGGAT-TATGCCTGGGCACS') and the right 
panel those from the opposite strand with the D primer 
30 ( 5 ' GTTGGCATGCTTTGATGACGCTTC3 ' ) . The brakets indicate 
the three nucleotides in the normal that are absent in 
CF (arrowheads) . Sequencing was performed as described 
in F. Sanger, S. Nicklen, A. R. Coulsen, Proc. Nat. 
Acad. Sci, U. S. A. 74: 5463 (1977). 
35 To investigate the proportion of CF patients 

carrying this deletion (F508) , genomic DNA samples from 
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patients and their parents were each amplified with 
oligonucleotide pri-ers flanking the .nutation in a 
polymerase chain reaction and hybridized to 32p- labeled 
oligonucleotides specific for the'norael and the 

oT^TanTT " 9U " ,e * 8 (M ' FigU " 2) • «- 
of this analysis are shown in Table 2. 
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TABLE 2 

DISTRIBUTION OF CF AND NON-CF(N) CHROMOSOMES 
WITH AND WITHOUT THE 3 b p DELETION 

5 

*• CF chromosome N chromosomes 

without the deletion _&9 

with the deletion 145 p_ 

10 Total 214 198 

b. CF chromosomes 

With the ?bp deletion without the deletion 



15 



20 



CF-PI 62 24 

CF-PS 5 9 

Unclassified 7£ 36 

Total 145 (68* 69 (32*) 



The data for the CF-PI (pancreatic insufficient) 
and CF-PS (pancreatic sufficient) chromosomes were 
derived from the CF families used in our linkage 
analysis. These families were originally selected 

25 without knowledge regarding PI or PS; the 15 CF-PS 

families subsequently identified were not included as 
part of this calculation. The unclassified CF 
chromosomes were obtained from the DNA Diagnosis 
Laboratory at the Hospital for Sick Children in Toronto 

30 and for which pancreatic function data were not 
available. 

It can be seen that 68* (145/214) of CF chromosomes 
in the general patient population had the F508 deletion 
(Table 2). In contrast none (0/198) of the N 

35 cfiroaosomes had the deletion (Table 2; * 2 -207, 

p<10" 57 » 5 ), suggesting that this sequence alteration is 
specific to CF and that it is the major mutation causing 
the disease. No recombination has been detected between 
the F508 deletion and CF. 

40 Other sequence differences were noted between the 

normal (T16-4.5) and CF (Cl-1/5) cDNA clones. At base 
position 2629, T16-4.5 showed a C and Cl-1/5 had a T, 
resulting in a Leu to Phe change at the amino acid 
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level. At position 4555, the base was g in TI6-4.5 but 
A in cl-l/5 (Val to Met) . These findings are believed 
to represent sequence polymorphism. Specific 
oligonucleotide hybridization analysis of patient/family 
5 DNA will identify these as other possible mutations. 
Additional nucleotide differences were observed in the 
3' untranslated regions between different cDNA clones " 
and the genomic DNA sequence, such differences in the 
sequences and as is appreciated, other sequence 
10 modifications are possible; for example, which 

differences are due to normal sequence polymorphisms and 
cloning artefacts, all of such differences being 
essentially equivalent to the sequence as described in 
Figure 1 in terms of its function and its commercial 
15 applications. 

The extensive genetic and physical mapping data 
have directed molecular cloning studies to focus on a 
small segment of DNA on chromosome 7. Because of the 
lack of chromosome deletions and rearrangements in CF 
-0 and the lack of a well-developed functional assay for 
the CF gene product, the identification of the CF gene 
required a detailed characterization of the locus itself 
and comparison between the CF and normal (N) alleles 
Random, phenotypically normal, individuals could not 'be 
5 included as controls in the comparison due to the high 
frequency of symptomless carriers in the population. As 
a result, only parents of CF patients, each of whom by 
definition carries an N and a CF chromosome, were 
suitable for the analysis. Moreover, because of the 
strong allelic association observed between CF and some 
of the closely linked DNA markers, it was necessary to 
exclude the possibility that sequence differences 
detected between N and CF were polymorphisms associated 
with the disease locus. 
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2*2 IDENTIFICATION OP vvr^ and ™ifn,Y ffTWnTff 

To determine the relationship of each of the DNA 
segments isolated from the chromosome walking and 
jumping experiments to CF, restriction fragment length 
5 polymorphisms (RFLPs) were identified and used to study 
families where crossover events had previously been 
detected between CF and other flanking DNA markers. As 
shown in Figure 14, a total of 18 RFLPs were detected ir 
the 500 kb region; 17 of them (from E6 to CE1.0) listed 
10 in Table 3; some of them correspond to markers 
previously reported. 

Five of the RFLPs, namely 10-1X.6, T6/20, HI. 3 and 
CE1.0, were identified with cDNA and genomic DNA probes 
derived from the putative CF gene. The RFLP data are 
15 presented in Table 3, with markers in the MET and D7S8 
regions included for comparison. The physical distances 
between these markers as well as their relationship to 
the MET and D7S8 regions are shown in Figure 14. 
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H9TES FOR TABU T 

(«) The number of N and cf-P! (CP with pancreatic 
insufficiency) chromosomes were derived fro* the 
parents in the families UM(J -^ llnkag . 

^ ) S MJ " >MJ Mwr ? ™ ^ " i- y 

0»> standardized association (a) , which is less 

SOZZTi ^ flUCtUation ° f °»* allele 

for the comparison Vule's association coefficient 
A-(ad-bc,/ ( ad + bc), where a, b, c, and d are £T 
number of N chromosomes with DMA marker allele 1 
CP with 1, N with 2, and CF with 2 respective^ 
" Relative risk can be calculated using the 

relationship RR - (1 + A)/(i-a, or its reverse. 

(=) Allelic association (., , calculated according to 

(1984) assuming the frequency of 0.02 for CP 
chromosomes in the population is included for 

comparison. 

availLTLl^ nUnber ° f *—"»»* '-ilies 

available for the a^lysis, as was expected from the 

close distance between the marker, studied and CF, and 

^cess y .° f Bl8<iia ' nOSl »' al *™"v. »PP-ches 
w«* necessary in further fin. mapping of cp 

Allelic association (linkage disequilibrium) has 
been detected for many closely linked DNA markers. 
Hhlle the utility of using allelic association for 
-easurino^nrt-^A^Mr rir uncertain, an overall 
correlation has been observed between CF and the ' 
flanking DNA markers. A strong association with CF was 
noted for the closer DNA markers, D7S23 and D7S122 



25 



30 



35 
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whereas little or no association was detected for the 
more distant markers MET, D7S8 or D7S424 (see Figure l) . 

As shown in Table 3, the degree of association 
between DNA markers and CF (as measured by the Yule's 
5 association coefficient) increased from 0.35 for metH 
and 0.17 for J32 to 0.91 for 10-1X.6 (only CF-PI 
patient families were used in the analysis as they 
appeared to be genetically more homogeneous than CF-PS) . 
The association coefficients appeared to be rather 
10 constant over the 300 kb from EG1.4 to HI. 3; the 

fluctuation detected at several locations, most notably 
at H2.3A, E4.1 and T6/20, were probably due to the 
variation in the allelic distribution among the N 
chromosomes (see Table 2). These data are therefore 
15 consistent with the result from the study of recombinant 
families (see Figure 14). A similar conclusion could 
also be made by inspection of the extended DNA marker 
haplotypes associated with the CF chromosomes (see 
below) . However, the strong allelic association 
20 detected over the large physical distance between EG1.4 
and HI. 3 did not allow further refined mapping of the CF 
gene. Since J44 was the last genomic DNA clone 
isolated by chromosome walking and jumping before a cDNA 
clone was identified, the strong allelic association 
25 detected for the JG2E1-J44 interval prompted us to 
search for candidate gene sequences over this entire 
interval. It is of interest to note that the highest 
degree of allelic association was, in fact, detected 
between CF and the 2 RFLPs detected by 10-1X.6, a region 
30 near the major CF mutation. 

Table 4 shows pairwise allelic association between 
DNA markers closely linked to CF. The average number of 
chromosomes used in these calculations was 75-80 and 
only chromosomes from CF-PI families were used in 
35 scoring CF chromosomes. Similar results were obtained 
when Yule's standardized association (A) was used). 
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Strong allelic association was also detected among 
subgroups of RFLPs on both the CF and N chromosomes. As 
shown in Table 4, the DNA markers that are physically 
close to each other generally appeared to have strong 
association with each other. For example, strong (in 
some cases almost complete) allelic association was 
detected between adjacent markers E6 and E7 , between 
pH131 and W3D1.4 between the AccI and Haelll polymorphic 
sites detected by 10-1X.6 and amongst EG1.4, JG2E1, 
E2.6(E.9), E2.8 and E4.1. The two groups of distal 
markers in the MET and D7S8 region also showed some 
degree of linkage disequilibrium among themselves but 
they showed little association with markers from E6 to 
CE1.0, consistent with the distant locations for MET and 
D7S8. On the other hand, the lack of association 
between DNA markers that are physically close may 
indicate the presence of recombination hot spots* 
Examples of these potential hot spots are the region 
between E7 and pH131, around H2.3A, between J44 and the 
regions covered by the probes 10-1X.6 and T6/20 (see 
Figure 14). These regions, containing frequent 
recombination breakpoints, were useful in the subsequent 
analysis of extended haplotype data for the CF region.^ 
HAPLOTYPE ANALYSIS 

Extended haplotypes based on 23 DNA markers were 
generated for the CF and N chromosomes in the collection 
of families previously used for linkage analysis. 
Assuming recombination between chromosomes of different 
haplotypes, it was possible to construct several 
lineages of the observed CF chromosomes and, also, to 
predict the location of the disease locus. 

To obtain further information useful for 
understanding the nature of different CF mutations, the 
F508 deletion data were correlated with the extended DNA 
marker haplotypes. As shown in Table 5, five major 
groups of N and CF haplotypes could be defined by the 
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RFLPa within or iaunediately adjacent to the putative CF 
gene (regions 6-8). 
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(a) The extended haplotype data are derived from the CF 
families used in previous linkage studies (see footnote 

(a) of Table 3) with additional CF-PS. families collected 
subsequently (Kerem et al, Am. J. Genet. 44:827 (1989)). 
The data are shown in groups (regions) to reduce space. 
The regions are assigned primarily according to pairwise 
association data shown in Table 4 with regions 6-8 
spanning the putative CF locus (the F508) deletion is 
between regions 6 and 7) . A dash (-) is shown at the 
region where the haplotype has not been determined due to 
incomplete data or inability to establish phase. 
Alternative haplotype assignments are also given where 
date are incomplete. Unclassified includes those 
chromosomes with more than 3 unknown assignments. The 
haplotype definitions for each of the 9 regions are: 

Region 1- metD metD metH 

fianl Taq I Taal 

A = 1 i i 

B = 2 1 2 

C = 1 i 2 

D = 2 2 1 

E = 1 2 

F - 2 1 i 

G = 2 2 n 



Region 2- E6 E7 



PH131 W3D1.4 



2^31 JESSl fiinfl HindllT 



A = 1 2 2 

B = 2 l ! 

C - 1 2 i 

D = 2 1 2 

E = 2 2 2 

F - 2 2 1 

G - 1 2 1 

H = 1 1 2 



2 
1 
1 
2 
1 
1 
2 
2 



Region 3- H2.3A 

TaqI 

A = l 
B = 2 

Region 4- EG1.4 EG1.4 JG2E1 

Hjnciy Ball PstI 

A = l i 2 

B - 2 2 i 

C = 2 2 2 

D « 1 • ! 

E = 1 2 1 
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Region 5- 



A = 
B = 
C = 

Region 6- 



A = 
B = 
C « 
D = 
E « 
F = 
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TABLE 5 



(continued) 



£2.6 

2 " 

1 

2 

J44 
Xbal 

1 
2 
1 
1 
2 
2 



E2.8 

Ncol 

1 
2 
2 



E4.1 
MspI 

2 
1 
2 



1O-1X.610-1X.6 

AccT HaeTIT 



2 
1 
1 
2 
2 
2 



1 
2 
2 
2 
2 
1 



Region 7- 


T6/20 








MSDI 






A = 


1 






B = 


2 






Region 8- 


HI. 3 CE 1.0 






Ncol 


Ndel 




A = 


2 


1 




B = 


1 


2 




C = 


1 


1 




D * 


2 


2 




Region 9- 


J32 


J3.ll 


J29 






MspI 


£vuJI 


A = 


1 


1 


1 


B = 


2 


2 


2 


C = 


2 


1 


2 


D = 


2 


2 


1 


E = 


2 


1 


1 



(b) 



Number of chromosomes scored in each class: 
CF-PI(F) = CF chromosomes from CF-PI patients with 

the F508 deletion; 
CF-PS(F) = CF chromosomes from CF-PS patients with 

the F508 deletion; 
CF-PI = Other CF chromosomes from CF-PI patients; 
CF-PS = Other CF chromosomes from CF-PS patients; 
N = Normal chromosomes derived from carrier parents 
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It was apparent that most recombinations between 
haplotypes occurred between regions 1 and 2 and between 
regions 8 and 9, again in good agreement with the 
relatively long physical distance between these regions. 
5 Other, less frequent, breakpoints were noted between 

short distance intervals and they generally corresponded 
to the hot spots identified by pairwise allelic 
association studies as shown above. The striking result 
was that the F508 deletion associated almost 
10 exclusively with Group I, the most frequent CF 

haplotype, supporting the position that this deletion 
constitutes the major mutation in CF. More important, 
while the F508 deletion was detected in 89% (62/70) of 
the CF chromosomes with the AA haplotype (corresponding 
15 to the two regions, 6 and 7) flanking the deletion, none 
was found in the 14 N chromosomes within the same group 
( x 2 m 47.3, p <10" 4 ). The F508 deletion was therefore 
not a common sequence polymorphism associated with the 
core of the Group I haplotype (see Table 5) . 
20 One of the CF chromosomes, detected by the specific 

oligonucleotide probe for the F508 deletion, was found 
to belong to a different haplotype group (Group III) . 
None of the 9 other CF chromosomes nor 17 N chromosomes 
with the same group hybridized to the probe. This 
25 specific hybridization result suggests that the mutation 
harbored on this chromosome is similar to F508. 
Although recombination or gene conversion are possible 
mechanisms to explain the presence of this deletion on a 
non-Group I haplotype, it is more likely that these 2 
30 Group III chromosomes represent a recurrent mutation 
event, a situation similar to the fl s and B E mutations 
at the S globin locus. 

Together, the results of the oligonucleotide 
hybridization study and the haplotype analysis ^support 
35 the fact that the gene locus described here is the CF 
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gene and that the 3 bp (F508) deletion is the most 
common mutation in CF. 
2x£ OTHER CF NOTATIONS 

The association of the F508 deletion with i- common 
5 and 1 rare CF haplotype provided further insight into 
the number of mutational events that could contribute to 
Ah*» prsftfinni- p^Ai-Anni" pupuifctixbir. Ha sea" on the extensive 
haplotype data, the 2 original chromosomes in which the 
F508 deletion occurred are likely to carry the haplotype 
10 - AAAAAAA- (Group la) and -CBAACBA- (Group Ilia) , as 
defined in Table 5. The other Group I CF chromosomes 
carrying the deletion are probably recombination 
products derived from the original chromosome. If the 
CF chromosomes in each haplotype group are considered to 
15 be derived from the same origin, only 3-4 additional 
mutational events would be predicted (see Table 5) . 
However, since many of the CF chromosomes in the same 
group are markedly different from each other, further 
subdivision within each group is possible. As a result, 
20 a higher number of independent mutational events could 
be considered and the data suggest that at least 7 
additional, putative mutations also contribute to the 
CF-PI phenotype (see Table 4) . The mutations leading to 
the CF-PS subgroup are probably more heterogeneous* 
25 The 7 additional CF-PI mutations are represented by 

the haplotypes: -CAAAAAA- (Group lb), -CABCAAD- (Group 

Ic) , BBBAC- (Group Ha) , -CABBBAB^ (Group Va) . 

Although the molecular defect in each of these mutations 
has yet to be defined, it is clear that none of these 
30 mutations severely affect the region corresponding to 
the oligonucleotide binding sites used in the 
PCR/hybridization experiment. 
2*2 PANCREATIC SUFFICIENCY 

CF-PS is defined clinically as sufficient 
35 pancreatic exocrine function for digestion of food; 
however, the level of residual pancreatic enzyme 
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activity in the digestive system varies from patient to 
patient. Previous haplotype data suggested that the CF- 
PI and CF-PS patients are due to different mutant 
alleles. Although the basic biochemical defect in CF 
5 has yet to be defined, it is possible that the residual 
pancreatic enzyme activity in CF-PS patients is a direct 
reflection of the activity of the mutant CF gene 
product. Thus, the residual exocrine function conferred 
by a mild (CF-PS) allele, although much lower than that 
10 of the normal gene product, would constitute a dominant 
phenotype over that of more severe (CF-PI) mutations 
with little or no function. It follows that only 
patients carrying 2 copies of severe alleles would be 
CF-PI and that patients carrying 1 or 2 mild alleles 
15 would be CF-PS. 

To test the above hypothesis, the information on 
the proportion of CF patients carrying the F508 deletion 
could be utilized. Assuming that a severe mutation is 
recessive to a mild mutation and a distribution of CF 
20 alleles among the patient population according to the 
Hardy-Weinberg law, the frequency of severe alleles 
could be estimated to be 0.92 and that for the mild 
alleles (M) , 0.08 (see Table 6). 
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TABLE 6 



POPULATION ANSLYSIS OF CF-PI AND CF-PS 











5 

Pancreatic 


FF 


0.459 


21 


21.1 


insufficient (PI) FS 


0.331 


14 


15.2 




ss 


0.060 


4 


2.7 




Total 


0.850 


39 




10 Pancreatic 


FM 


0.106 


15<«) 


14.8 


sufficient (PS) 


SH 


0.038 


6 


6.2 




MM 


0.006 








Total 


0.150 


21 





15 (a) Allele designations: F « the 3bp deletion (deletion 
of phenylalanine at amino acid position 508) ; S - 
uncharacterized severe mutant alleles; M « 
uncharacterized mild mutant alleles. 

20 (b) Assuming that tht CF-PI mutant phenotype is 

recessive to the CF-PS mutant phenotype , the frequency 
of CF-PI mutant alleles, including the 3 bp deletion, 
could be estimated from the observed proportion of the 
CF-PI patients in the CF clinic [Corey et al J. Pediatr. 

25 115:274 (1989)], i.e., (0.85) * 0.92. The observed 
allele frequency for F in the total CF population is 
0.68 (Table 3); the frequency for S-0.92 - 0.68 - 0.24; 
the frequency for M » 1-0.92 -0.08. The frequency 
for each genotype was then calculated by using the 

30 Hardy-Weinberg Law. 

(c) The number of CF-PI and CF-PS patients in each 
category was obtained by oligonucleotide hybridization 
analysis as illustrated in Figure 15. The patients were 

35 from the CF families used in our linkage analysis with 
14 additional CF-PS patients/families from a subsequent 
study. Since SM and KM could not be distinguished 
genotypically or phenotypically, they were combined in 
the analysis. 

40 

(d) The expected numbers were calculated for CF-PI and 
CF-PS after normalization within each group. The x of 
fit is 0.86, d.f. ~ 3, 0.74 <p <0.90 

45 (e) This number is higher than would be expected "(15 
observed vs. 9.6 expected) if the F508 deletion is in 
Hardy-Weinberg equilibrium among all CF chromosomes 
(^2.6.48, d.f. * 1, p <0.011 
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Since the majority of CF-PI patients were found to 
be homozygous for the F508 mutation (F), it was 
reasonable to assume that this nutation corresponded to 
one of the severe alleles. Given the observed 

5 frequency of F (0.68) in the studied CP population, the 
frequency of the remaining severe alleles (S) could be 
derived. The proportion of FF, SS, MM, FS, FM and SM 
patients was then calculated, since individuals with SM 
and MM could not be distinguished phenotypically or 

10 genotypically, they were combined in the analysis. As 
shown in Table 6, the observed frequencies for all 5 
groups of patients were as expected from this- 
hypothesis. 

The above analysis thus provides strong support for 
15 our position that CF-PI is due to the presence of 2 
severe alleles and that a CF-PS patient carries either a 
single severe allele or 2 mild alleles. This model also 
explains the lower frequency of the F508 deletion in the 
CF-PS than in the CF-PI population and the excess number 
20 of CF-PS patients with one copy of the deletion (see 
note in Table 6) . 

Given the predicted dominant phenotype conferred by 
the M alleles, it was necessary to examine the CF 
chromosomes in CF-PS patients individually in order to 
25 identify those carrying the M alleles. As shewn in 
Table 7, five of the 7 representative CF-PS 'patients 
carry one copy of the F508 deletion; at least 5 
different haplotypes could be assigned to the other CF 
chromosomes. 
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F (Group la) 
S (predicted, 
Group lb) 

F (Group la) 
F (Group la) 

F (Group la) 
F (Group la) 

F (Group la) 
F (Group la) 



(a) The haplotype definitions are the sane as in Table 
5 • 

(b) Allele designations are the same as in Table 6: 
F=the F508 deletion; S- uncharacterized severe* 
mutant allele; M«uncharacterized mild mutant 
allele. . 
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33 
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These latter observations provide further support 
that the majority of CF-PS patients are compound 
heterozygotes . 

AjlSL CFTR PR9TSIH 
5 As discussed with respect to the DNA sequence of 

Figure 1, analysis of the sequence of the overlapping 
cDNA clones predicted an unprocessed polypeptide of 1480 
amino acids with a molecular mass of 168,138 daltons. 
As later described, due to polymorphisms in the protein, 
10 the molecular weight of the protein can vary due to 
possible substitutions or deletion of certain amino 
acids. The molecular weight will also change due to the 
addition of carbohydrate units to form a glycoprotein. 
It is also understood that the functional protein in the 
15 cell will be similar to the unprocessed polypeptide, but 
may be modified due to cell metabolism. 

Accordingly, the invention provides purified normal 
CFTR polypeptide characterized by a molecular weight of 
about 170,000 daltons and having epithelial cell 
20 transmembrane ion conductance activity. The normal CFTR 
polypeptide, which is substantially free of other human 
proteins, is encoded by the aforementioned .DNA sequences 
and according to one embodiment, that of Figure 1. Such 
polypeptide displays the immunological or biological 
25 activity of normal CFTR polypeptide. As will be ikter 
discussed, the CFTR polypeptide and fragments thereof 
may be made by chemical or enzymatic peptide synthesis 
or expressed in an appropriate cultured cell system. 
The invention also provides purified mutant CFTR 
30 polypeptide which is characterized by cystic fibrosis- 
associated activity in human epithelial cells. Such 
mutant CFTR polypeptide, as substantially free of other 
human proteins, can be encoded by the mutant DNA 
sequence. 
35 ±H STRUCTURE OF CFTR 
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The most characteristic feature of the predicted 
^rotein is the presence of two repeated motifs, each of 
which consists of a set of amino acid residues capable 
of spanning the membrane several times followed by 
5 sequence resembling consensus nucleotide (ATP) -binding 
folds (NBFs) (Figures 11, 12 and 16). These 
characteristics are remarkably similar to those of the 
mammalian multidrug resistant P-glycoprotein and a 
number of other membrane-associated proteins, thus 
10 implying that the predicted CF gene product is likely to 
be involved in the transport of substances (ions) across 
the membrane and is probably a member of a membrane 
protein super family. 

Figure 13 is a schematic model of the predicted 
15 CFTR protein. In figure 13, cylinders indicate membrane 
spanning helices, hatched spheres indicate NBFs. The 
stippled sphere is the polar R-domain.The 6 membrane 
spanning helices in each half of the molecule are 
depicted as cylinders. The inner cytoplasmically 
20 oriented NBFs are shown as hatched spheres with slots to 
indicate the means of entry by the nucleotide. The 
large polar R-domain which links the two halves is 
represented by an stippled sphere. Charged individual 
amino acids within the transmembrane segments and on the 
25 R-domain surface are depicted as small circles- 
containing the charge sign. Net charges on the internal 
and external loops joining the membrane cylinders and on 
regions of the NBFs are contained in open squares. 
Sites for phosphorylation by protein kinases A or C are 
30 shown by closed and open triangles respectively. 

K,R,H,D, and E are standard nomenclature for the amino 
acids, lysine, arginine, histidine, aspartic acid and 
glutamic acid respectively. 

Each of the predicted membrane-associated jregions 
35 of the CFTR protein consists of 6 highly hydrophobic 

segments capable of spanning a lipid bilayer according 
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to the algorithms of Kyte and Doolittle and of .Gamier 
et al r .T. Mol. Biol. 120, 97 (1978) (Figure 13). The 
membrane-associated regions are each followed by a large 
hydrophilic region containing the N6Fs. Based on 
5 sequence alignment with other known nucleotide binding 
proteins, each of the putative NBFs in CFTR comprises at 
least 150 residues (Figure 13). The 3 bp deletion 
detected in the majority of CF patients is located 
between the 2 most highly conserved segments of the 

10 first NBF in CFTR. The amino acid sequence identity 
between the region surrounding the phenylalanine 
deletion and the corresponding regions of a number of 
other proteins suggests that this region is of 
functional importance (Figure 16) . A hydrophobic amino 

15 acid, usually one with an aromatic side chain, is 
present in most of these proteins at the position 
corresponding to F508 of the CFTR protein. It is 
understood that amino acid polymorphisms may exist as a 
result of DNA polymorphisms. 

20 Figure 16 shows alignment of the 3 most conserved 

segments of the extended NBF's of CFTR with comparable 
regions of other proteins. These 3 segments consist of 
residues 433-473, 488-513, and 542-584 of the N-terminal 
half and 1219-1259, 1277-1302, and 1340-1382 of the C- 

25 terminal half of CFTR. The heavy overlining points out 
the regions of greatest similarity. Additional general 
homology can be seen even without the introduction of 
gaps. 

Despite the overall symmetry in the structure of 
30 the protein and the sequence conservation of the NBFs, 
sequence homology between the two halves of the 
predicted CFTR protein is modest. This is demonstrated 
in Figure 12, where amino acids 1-1480 are represented 
on each axis. Lines on either side of the identity 
35 diagonal indicate the positions of internal 

similarities. Therefore, while four sets of internal 
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sequence identity can be detected as shown in Figure 
12, using the Dayhoff scoring matrix as applied by 
Lawrence et al. [C. B. Lawrence, D. A. Goldman, and R. 
T. Hood, pyn Math Biol. 48, 569 (1986)], three of 
5 these are only apparent at low threshold settings for 
standard deviation. The strongest identity is between 
sequences at the carboxyl ends of the NBFs. Of the 66 
residues aligned 27% are identical and another 11% are 
functionally similar. The overall weak internal 

10 homology is in contrast to the much higher degree (>70%) 
in P-glycoprotein for which a gene duplication 
hypothesis has been proposed (Gros et al, £&U 47, 371, 
1986, C. Chen et al, Qsll 47, 381, 1986, Gerlach et al, 
Nature . 324, 485, 1986, Gros et al, Mol. Cell. Biol. 8, 

15 2770, 1988). The lack of conservation in the relative 
positions of the exon-intron boundaries may argue 
against such a model for CFTR (Figure 2) . 

Since there is apparently no signal -peptide 
sequence at the amino-terminus of CFTR, the highly 

20 charged hydrophilic segment preceding the first 

transmembrane sequence is probably oriented in the 
cytoplasm. Each of the 2 sets of hydrophobic helices 
are expected to form 3 transversing loops across the 
membrane and little sequence of the entire protein is 

25 expected to be exposed to the exterior surface, except 
the region between transmembrane segment 7 and 8. It is 
of interest to note that the latter region contains two 
potential sites for N-linked glycosylation . 

Each of the membrane-associated regions is followed 

30 by a NBF as indicated above. In addition, a highly 
charged cytoplasmic domain can be identified in the 
middle of the predicted CFTR polypeptide, linking the 2 
halves of the protein. This domain, named the R-domain, 
is operationally defined by a single large exon in which 

35 69 of the 241 amino acids are polar residues arranged in 
alternating clusters of positive and negative charges. 
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Moreover, 9 of the 10 consensus sequences required for 
phosphosphorylation by protein kinase A (PKA) , and, 7 of 
the potential substrate sites for protein kinase C (PKC) 
found in CFTR are located in this exon. 
5 ±*2. FUNCTION QP CFTR 

Properties of CFTR can be derived from comparison 
to other membrane-associated proteins (Figure 16) . in 
addition to the overall structural similarity with the 
mammalian P-glycoprotein, each of the two predicted 
10 domains in CFTR also shows remarkable resemblance to the 
single domain structure of hemolysin Bof L. coli and 
the product of the White gene of Drosophila'. These 
latter proteins are involved in the transport of the 
lytic peptide of the hemolysin system and of eye pigment 
15 molecules, respectively. The vitamin B12 transport 
system of £^ fifiH, BtuD and MbpX which is a liverwort 
chloroplast gene whose function is unknown also have a 
similar structural motif. Furthermore, the CFTR protein 
shares structural similarity with several of the 
20 periplasmic solute transport systems of gram negative 
bacteria, where the transmembrane region and the ATP- 
binding folds are contained in separate proteins which 
function in concert with a third substrate-binding 
polypeptide. 

25 The overall structural arrangement of the 

transmembrane domains in CFTR is similar to. several 
cation channel proteins and some cation-translocating 
ATPases as well as the recently described adenylate 
cyclase of bovine brain. The functional significance of 

30 this topological classification, consisting of 6 
transmembrane domains, remains speculative. 

Short regions of sequence identity have also been 
detected between the putative transmembrane regions of 
CFTR and other membrane-spanning proteins. 

35 Interestingly, there are also sequences, 18 amino acids 
in length situated approximately 50 residues from the 
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carboxyl terminus of CFTR and the raf serine/threonine 
kinase protooncogene of Xenopus laevis which are 
identical at 12 of these positions. 

Finally, an amino acid sequence identity (10/13 
5 conserved residues) has been noted between a hydrophili 
segment (position 701-713) within the highly charged R- 
domain of CFTR and a region immediately preceding the 
first transmembrane loop of the sodium channels in both 
rat brain and eel. The charged R-domain of CFTR is not 
10 shared with the topologically closely related P- 

glycoprotein; the 241 amino acid linking-peptide is 
apparently the majtor -djbe*ftKeiw Jv&w*ar iShr tJwo" 
proteins. 

In summary, features of the primary structure of 
15 the CFTR protein indicate its possession of properties 
suitable to participation in the regulation and control 
of ion transport in the epithelial cells of tissues 
affected in CF. Secure attachment to the membrane in 
two regions serve to position its three major 
20 intracellular domains (nucleotide-binding folds l and 2 
and the R-domain) near the cytoplasmic surface of the 
cell membrane where they can modulate ion movement 
through channels formed either by CFTR transmembrane 
segments themselves or by other membrane proteins. 
25 In view of the genetic data, the tissue- • 

specificity, and the predicted properties of the CFTR 
protein, it is reasonable to conclude that CFTR is 
directly responsible for CF. It, however, remains 
unclear how CFTR is involved in the regulation of ion 
30 conductance across the apical membrane of epithelial 
cells. 

It is possible that CFTR serves as an ion channel 
itself. As depicted in Figure 13, 10 of the 12 
transmembrane regions contain one or more amino racids 
35 with charged side chains, a property similar to the 
brain sodium channel and the GABA receptor chloride 
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channel subunits, where charged residues are present in 
4 of the 6 f and 3 of the 4, respective membrane- 
associated domains per subunit or repeat unit. The 
amphipathic nature of these transmembrane segments is 
5 believed to contribute to the channel-forming capacity 
of these molecules* Alternatively, CFTR may not be an 
ion channel but instead serve to regulate ion channel 
activities. In support of the latter assumption, none 
of the purified polypeptides from trachea and kidney 
10 that are capable of reconstituting chloride channels in 
lipid membranes [Landry et al, Science 224:1469 (1989)] 
appear to be CFTR if judged on the basis of the 
molecular mass. 

In either case, the presence of ATP-binding domains 
15 in CFTR suggests that ATP hydrolysis is directly 

involved and required for the transport function. The 
high density of phosphorylation sites for PKA and PKC 
and the clusters of charged residues in the R^-domain may 
both serve to regulate this activity. The deletion of a 
20 phenylalanine residue in the NBF may prevent proper 

binding of ATP or the conformational change which this 
normally elicits and consequently result in the 
observed insensitivity to activation by PKA- or PKC- 
mediated phosphorylation of the CF apical chloride - 
25 conductance pathway. Since the predicted protein 
contains several domains and belongs to a family of 
proteins which frequently function as parts of multi- 
component molecular systems, CFTR may also participate 
in epithelial tissue functions of activity or 
30 regulation not related to ion transport. 

With the isolated CF gene (cDNA) now in hand it is 
possible to define the basic biochemical defect in CF 
and to further elucidate the control of ion transport 
pathways in epithelial cells in general. Most 
35 important, knowledge gained thus far from the predicted 
structure of CFTR together with the additional 
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information from studies of the protein itself provide a 
basis for the development of improved means of treatment 
of the disease. In such studies, antibodies have been 
raised to the CFTR protein as later described. 
5 1^1 PROTEIN PURIFICATION 

The CFTR protein can be purified by methods 
selected on the basis of properties as revealed by its 
sequence. For example, since it possesses distinctive 
properties of an integral membrane protein, a membrane 
10 fraction of the epithelial cells in which it is highly 
expressed (e.g., the cultured colonic carcinoma cell 
line, T84) is first isolated using established methods 
[J. E. Langridge, et al, Biochin. B tophva. Arta . 751: 
318 (1983)]. The peripheral proteins of these membranes 
15 are those removed by extraction with high salt 

concentrations, high pH or chaotropic agents such as 
lithium diiodosalicylate. All of the integral proteins 
remaining including the CFTR protein are then 
solubilized using a detergent such as octyl glucoside 
20 (Landry, et al, sacra.) , CHAPS [D. J. Beros et al, J. 
SiPl, Chem., 262: 10613 (1987)], or other compounds of 
similar action. Making use of the nucleotide binding 
domains of CFTR, cibacron-blue [S. T. Thompson et al. 
Proc. Nat. Acad. Scl. u. s. a. 72: 669 (1975)] affinity 
25 chromatography is then used to bind the CFTR protein and 
remove it from other integral proteins of the detergent 
stabilized mixture. Since CFTR is a glycoprotein, 
differential lectin chromatography can bring about 
further purification [Riordan et al. J. Biol. Chen. 254: 
30 1270 (1979)]. Final purification to homogeneity is then 
achieved using other standard protein purification 
procedures; i.e., ion exchange chromatography, gel 
permeation chromatography, adsorption chromatography or 
isoelectric focussing as necessary. Alternatively, use 
35 is made of single step purification procedures, such as 
immuno-affinity chromatography using immobilized 
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antibodies to the CFTR protein (or fragments thereof) or 
preparative polyacrylamide gel electrophoresis using 
advanced instrumentation such as the Applied Biosys terns 
"230A HPEC System", Based on experience in the 
5 purification of P-glycoprotein [Riordan et al, supra 1 , 
another member of the general category of nucleotide 
binding transport-associated membrane proteins, the 
purification of the CFTR protein is facilitated. 

In addition to purification from tissues and cells 
10 in which the CFTR protein is highly expressed, similar 
procedures are used to purify CFTR from cells 
transfected with vectors containing the CF gene (cDNA) 
as described above. Protein products resulting from 
expression of modified version of the cDNA sequence are 
15 purified in a similar manner. Criteria of the 
homogeneity of protein so provided include those 
standard to the field of protein chemistry including one 
and two dimensional gel electrophoresis and N-terminal 
amino acid determination. The purified protein is used 
20 in further physical biochemical analysis to determine 

features of its secondary and tertiary structure, to aid 
in the design of drugs to promote the proper functioning 
of the mutant CF forms. In preparation for use in 
protein therapy, the absence of potentially toxic 
25 contaminating substances is considered. It is 

recognized that the hydrophobic nature of the protein 
necessitates the inclusion of amphiphilic compounds such 
as detergents and others [J. V. Ambud Kar and P. C. 
Maloney J. Biol, Chem. 261: 10079 (1986)] at all stages 
30 of its handling. 
Z±£L CT SCREENING 

SjLL DNA BASED DIAGNOSIS 

Given the knowledge of the major mutation as 
disclosed herein, carrier screening and prenatal - 
35 diagnosis can be carried out as follows. 
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The high risk population for cystic fibrosis is 
Caucasians. For example, each Caucasian woman and/or 
ffiaan of child-bearing age would be screened to determine 
if she or he was a carrier (approximately a 5% 
5 probability for each individual), if both are carriers, 
they are a couple at risk for a cystic fibrosis child. 
Each child of the at risk couple has a 25% chance of 
being affected with cystic fibrosis. The procedure for 
determining carrier status using the probes disclosed - 
10 herein is as follows. 

one major application of the DNA sequence 
information of the normal and mutant CF genes is in the 
area of genetic testing, carrier detection and prenatal 
diagnosis. Individuals carrying mutations in the CF 
gene (disease carrier or patients) may be detected at 
the DNA level with the use of a variety of techniques. 
The genomic DNA used for the diagnosis may be obtained 
from body cells, such as those present in peripheral 
blood, urine, saliva, tissue biopsy, surgical : specimen 
and autopsy material. The DNA may be used directly for 
detection of specific sequence or may be amplified 
enzymatically in Vitro by using PCR (Saiki et al. 
SSisnSfi 230: 1350-1353, (1985), Saiki et al. fiaiuzfe 324: 
163-166 (1986)) prior to analysis. RNA or its cDNA form 
25 may also be used for the same purpose. Recent : reviews 
of this subject have been presented by Caskey, ( Science 
236: 1223-8 (1989) and by Landegren et al (SsiSBS& 242: 
229-237 (1989). 

The detection of specific DNA sequence may be 
30 achieved by methods such as hybridization using specific 
oligonucleotides (Wallace et al. Cold Sp^tvt W „ r ^ ]r 
SYffP. Quant , gjo1, 51: 257-261 (1986)), direct DNA 
sequencing (Church and Gilbert, Proc. m, fc . ^ , d . BtH . n , 
S^A*. 81: 1991-1995 (1988)), the use of restriction 
35 enzymes (Flavell et al. fieJLl 15: 25 (1978) , Geever et al 
Proc, Nat, Acad. Sol, u. s. ft. 7 8: 5081 (1981)), 
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discrimination on the basis of electrophoretic mobility 
in gels with denaturing reagent (Myers and Maniatis, 
Cold Spring Harbour Svm. ouant. bi^ , 5l! 275-284 
(1986)), RNase protection (Myers, R. M. , Larin, J., and 
5 T. Maniatis Science 230: 1242 (1985)), chemical cleavage 
(Cotton et al Proc. Nat. Acad, sai T u. s. a. 8 5: 4397^ 
4401, (1985)) and the ligase-mediated detection 
procedure [Landegren et al Science 241:1077 (1988)]. 
Oligonucleotides specific to normal or mutant 
10 sequences are chemically synthesized using commercially 
available machines, labelled radioactively with isotopes 
(such as 32 P) or non-radioactively (with tags such as 
biotin (Ward and Langer et al. Proc. Wat. Acad, sci. n. 
St At 78: 6633-6657 (1981)), and hybridized to 
15 individual DNA samples immobilized on membranes or other 
solid supports by dot-blot or transfer from gels after 
electrophoresis. The presence or absence of these 
specific sequences are visualized by methods such as 
autoradiography or fluorometric (Landegren et al, 1989, 
20 supra) or colorimetric reactions (Gebeyehu et a. Nucleic 
Acids Research, 15: 4513-4534 (1987)). An embodiment of 
this oligonucleotide screening method has been applied 
in the detection of the F508 deletion as described 
herein. 

25 Sequence differences between normal and mutants may 

be revealed by the direct DNA sequencing method of 
Church and Gilbert (sjjpra) . cloned DNA segments may be 
ua«d as probes to detect specific DNA segments. The 
Mnaitivity of this method is greatly enhanced when 

30 combined with PCR [WrichniJc et al, Nucleic a«hh« pqc 
15:529-542 (1987); Wong et al, Ha£urja 330:384-386 
(1987); Stoflet et al, Science 239:491-494 (1988)]. In 
the latter procedure, a sequencing primer which lies 
within the amplified sequence is used with doubled 

35 stranded PCR product or single-stranded template 

generated by a modified PCR. The sequence determination 
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is performed by conventional procedures with 
radiolabeled nucleotides or by automatic sequencing 
procedures with fluorescent-tags. ^ 

Sequence alterations may occasionally generate 
5 fortuitous restriction enzyme recognition sites which 
are revealed by the use of appropriate enzyme digestion- 
followed by conventional gel-blot hybridization 
(Southern, J T M<?li Bi<?l 98: 503 (1975)). DNA fragments 
carrying the site (either normal or mutant) are detected 
10 by their reduction in size or increase of corresponding 
restriction fragment numbers. Genomic DNA samples may 
also be amplified by PGR prior to treatment with the 
appropriate restriction enzyme; fragments of different 
sizes are then visualized under UV light in the presence 
15 of ethidium bromide after gel electrophoresis. 

Genetic testing based on DNA sequence differences 
may be achieved by detection of alteration in 
electrophoretic mobility of DNA fragments in gels with 
or without denaturing reagent. Small sequence deletions 
20 and insertions can be visualized by high resolution gel 
electrophoresis. For example, the PGR product with the 
3 bp deletion is clearly distinguishable from the normal 
sequence on an 8% non-denaturing poly aery 1 amide gel. 
DNA fragments of different sequence compositions may be 
25 distinguished on denaturing formamide gradient gel in 
which the mobilities of different DNA fragments are 
retarded in the gel at different positions according to 
their specific "partial-melting" temperatures (Myers, 
aUfid) • In addition, sequence alterations, in 
30 particular small deletions, may be detected as changes 
in the migration pattern of DNA heteroduplexes in non- 
denaturing gel electrophoresis, as have been detected 
for the 3 bp (F508) mutation and in other experimental 
systems [Nagamine et al, Am. J. Hum. Genet . 45:337-339 
35 (1989)]. Alternatively, a method of detecting a 

mutation comprising a single base substitution or other 
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small change could be based on differential primer 
length in a PCR. For example, one invariant primer 
could be used in addition to a primer specific for a 
mutation. The PCR products of the normal and mutant 
5 genes can then be differentially detected in acrylamide 
gels. 

Sequence changes at specific locations may also be 
revealed by nuclease protection assays, such as RNase 
(Myers, sjipxa.) and SI protection (Berk, A. J., and P. A. 
10 Sharoe Proc. Nat. Acad, sci. n. a. a. 7*. 1274 (1978)), 
the chemical cleavage method (Cotton, supra \ or the 
ligase-mediated detection procedure (Landegren supra \ . 

In addition to conventional gel-electrophoresis and 
blot-hybridization methods, DNA fragments may also be 
15 visualized by methods where the individual DNA samples 
are not immobilized on membranes. The probe and target 
sequences may be both in solution or the probe sequence 
may be immobilized [Saiki et al, Proc. Natl. Acad, sci 
2SA, 86:6230-6234 (1989)]. A variety of detection 
20 methods, such as autoradiography involving 

radioisotopes, direct detection of radioactive decay (in 
the presence or absence of scintillant) , 
spectrophotometry involving colorigenic reactions and 
fluorometry involving fluorogenic reactions, may be used 
25 to identify specific individual genotypes. 

Since more than one mutation is anticipated in the 
CP gene, a multiples system is an ideal protocol for 
screening CF carriers and detection of specific 
mutations. For example, a PCR with multiple, specific 
30 oligonucleotide primers and hybridization probes, may be 
used to identify all possible mutations at the same 
time (Chamberlain et al. Nucleic A cids Research 16: 
1141-1155 (1988)). The procedure may involve 
immobilized sequence-specific oligonucleotides probes 
35 (Saiki et al, supra ) . 

detecting the major mutation 



WO 91/02796 



81 



PCT/CA90/00267 



These detection methods may be applied to prenatal 
diagnosis using amniotic fluid cells, chorionic villi 
biopsy or sorting fetal cells from maternal circulation. 
The test for CF carriers in the population may be 
5 incorporated as an essential component in a broad- 
scale genetic testing program for common diseases. 

According to an embodiment of the invention, the 
portion of the DNA segment that is informative for a 
mutation, such as the mutation according to this 

10 embodiment, that is, the portion that immediately 

surrounds the F508 deletion, can then be amplified by 
using standard PCR techniques [as reviewed in Landegren, 
Ulf, Robert Kaiser, C. Thomas Caskey, and Leroy Hood, 
DNA Diagnostics - Molecular Techniques and Automation, 

15 in Science 242: 229-237 (1988)]. It is contemplated 

that the portion of the DNA segment which is used may be 
a single DNA segment or a mixture of different DNA 
segments. A detailed description of this technique now 
follows . 

20 A specific region of genomic DNA from the person or 

fetus is to be screened. Such specific region is 
defined by the oligonucleotide primers C16B 
( 5 ' GTTTTCCTGGATTATGCCTGGGCAC3 ' ) and C16D 
(5 'GTTGGCATGCTTTGATGACGCTTC3 ' ) . The specific regions 

25 were amplified by the polymerase chain reaction (PCR) . 
200-400 ng of genomic DNA, from either cultured 
lymphoblasts or peripheral blood samples of CF 
individuals and their parents, were used in each PCR 
with the oligonucleotides primers indicated above. The 

30 oligonucleotides were purified with Oligonucleotide 
Purification Cartridges" (Applied Biosystems) or 
NENSORB" PREP columns (Dupont) with procedures 
recommended by the suppliers. The primers were annealed 
at 62 "c for 45 sec, extended at 72*C for 120 sec (with 2 

35 units of Taq DNA polymerase) and denatured at 94 "c for 
60 sec, for 28 cycles with a final cycle of 7 min for 



WO 91/02796 



82 



PCT/CA90/00267 



extension in a Perkin-Elmer/Cetus automatic thermocycler 
with a Step-Cycle program (transition setting at 1.5 
min) . Portions of the PCR products were separated by 
electrophoresis on 1.4% agarose gels, transferred to 
5 Zetabind"; (Biorad) membrane according to standard 

procedures. The two oligonucleotide probes of Figure 
15 (10 ng each) were labeled separately with 10 units of 
T4 polynucleotide kinase (Pharmacia) in a 10 /il reaction 
containing 50 mM Tris-HCl (pH7.6), 10 mM MgCl2/ 0.5 mM 

10 dithiothreitol, 10 mM spermidine, 1 mM EDTA and 30-40 
pCi of 7[ 32 P] - ATP for 20-30 min at 37*C. The 
unincorporated radionucleotides were removed with a 
Sephadex G-25 column before use. The hybridization 
conditions were as described previously (J.M. Rommens et 

15 al Am. J. Hum. Genet . 43,645 (1988)) except that the 
temperature was 37 # C. The membranes were washed twice 
at room temperature with 5xSSC and twice at 39 *C with 2 
x SSC (1 x SSC - 150 mM NaCl and 15 mM Na citrate) . 
Autoradiography was performed at room temperature 

20 overnight. Autoradiographs show the hybridization 
results of genomic DNA with the 2 specific 
oligonucleotide probes as indicated in Figure 15. Probe 
C detects the normal DNA sequence and Probe F detects 
the mutant sequence. Genomic DNA sample from each 

25 family member was amplified by the polymerase chain 

reaction and the products separated by electrophoresis 
on a 1.4% agarose gel and then transferred to Zetabind 
(Biorad) membrane according to standard procedures. 
Water blank and plasmid DNA, T16 and C16, corresponding 

30 to the normal sequence (N) and the F508 deletion (CF) , 
respectively, were included as controls. 

The 3bp deletion was also revealed by 
polyacrylamide gel electrophoresis* When the PCR 
generated by the above-mentioned C16B and C16D primers 

35 were applied to an 8% polyacrylamide gel, 

electrophoresed for 2 hrs at 20V/cm in a 90mM Tris- 
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borate buffer (pH 8.3), DNA fragments of a different 
mobility were clearly detectable for individuals without 
the 3 bp deletion, heterozygous or homozygous for the 
deletion. In addition, an extra DNA band, presumably 
5 the heteroduplex between normal and mutant DNA strands, 
was noted in heterozygotes. Similar alteration in gel 
mobility for heteroduplexes formed during PCR has also 
been reported for experimental systems where small 
deletions are involved (Nagamine et al supra ) . These 
10 mobility shifts may be used as the basis for the non- 
radioactive genetic screening tests. 

5.3 CF SCREENI NG PROGRAMS 

It is appreciated that only 70% of the carriers can 
be detected using the specific F508 probes of this 

15 particular embodiment of the invention. Thus, if an 
individual tested is not a carrier using the F508 
probes, their carrier status can not be excluded, they 
may carry some other mutation as previously noted. 
However, if both the individual and the spouse of the 

20 individual tested are a carrier for the F508 mutation, 

it can be stated with certainty that they are an at risk 
couple. The sequence of the gene as disclosed herein 
is an essential prerequisite for the determination of 
the other mutations. 

25 Prenatal diagnosis is a logical extension: of 

carrier screening. A couple can be identified as at 
risk for having a cystic fibrosis child in one of two 
ways: if they already have a cystic fibrosis child, 
they are both, by definition, obligate carriers of the 

30 disease, and each subsequent child has a 25% chance of 
being affected with cystic fibrosis. A major advantage 
of the present invention eliminates the need for family 
pedigree analysis, whereas, according to this invention, 
a gene mutation screening program as outlined above or 

35 other similar method can be used to identify a genetic 
mutation that leads to a protein with altered function. 
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This is not dependent on prior ascertainment of the 
family through an affected child. Fetal DNA samples, 
for example, can be obtained, as previously mentioned, 
from amniotic fluid cells and chorionic villi 
5 specimens. Amplification by standard PCR techniques can 
then be performed on this template DNA. 

If both parents are shown to be carriers with the 
F508 deletion, the interpretation of the results would 
be the following, if there is hybridization of the 
10 fetal DNA to the normal (no deletion, as shown in 

Figure 15) probe, the fetus will not be affected with 
cystic fibrosis, although it may be a CF carrier (50% 
probability for each fetus of an at risk couple) . If the 
fetal DNA hybridizes only to the F508 deletion probe and 
not to the normal probe (as shown in Figure 15) , the 
fetus will be affected with cystic fibrosis. 

It is appreciated that for this and other mutations 
in the CF gene, a range of different specific 
procedures can be used to provide a complete diagnosis 
for all potential CF carriers or patients. A complete 
description of these procedures is later described. 

The invention therefore provides a method and kit 
for determining if a subject is a CF carrier or CF 
patient, in summary, the screening method comprises the 
25 steps of: 

providing a biological sample of the subject to be 
screened; and providing an assay for detecting in the 
biological sample, the presence of at least a member 
from the group consisting of the normal CF gene, normal 
30 CF gene products, a mutant CF gene, mutant CF gene 
products and mixtures thereof. 

The method may be further characterized by 
including at least one more nucleotide probe which is a 
different DNA sequence fragment of, for example, the DNA 
35 of Figure 1, or a different DNA sequence fragment of 
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human chromosome 7 and located to either side of the DNA 
sequence of Figure 1. 

A kit, according to an embodiment of the invention, 
suitable for use in the screening technique and for 
5 assaying for the presence of the CF gene by an 
immunoassay comprises: 

(a) an antibody which specifically binds to a gene " 
product of the CF gene; 

(b) reagent means for detecting the binding of the 
10 antibody to the gene product; and 

(c) the antibody and reagent means each being 
present in amounts effective to perform the immunoassay. 

The kit for assaying for the presence for the CF 
gene may also be provided by hybridization techniques. 
15 The kit comprises: 

(a) an oligonucleotide probe which specifically 
binds to the CF gene; 

(b) reagent means for detecting the hybridization 
of the oligonucleotide probe to the CF gene; and 

20 (c) the probe and reagent means each being present 

in amounts effective to perform the hybridization assay. 

5*1 ANTIBODIES TO DETECT PPTfl 

As mentioned, antibodies to epitopes within the 
CFTR protein are raised to provide extensive information 
!5 on the characteristics of the protein and other valuable 
information which includes: 

1. To enable visualization of the protein in 
cells and tissues in which it is expressed by 
immunoblotting ("Western blots") following 
0 polyacrylamide gel electrophoresis. This allows 

an estimation of the molecular size of the mature 
protein including the contribution from the cells 
of post-translationally added moieties including 
oligosaccharide chains and phosphate groups, for 
5 example. Immunocytochemical techniques including 

immunofluorescence and immuno-electronmicroscopy 
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can be used to establish the subcellular 
localization of the protein in cell membranes. The 
antibodies can also be used to provide another 
technique in detecting any of the other CF l 
5 mutations which result in the synthesis of a 

protein with an altered size. 

2. Antibodies to distinct domains of the protein 
can be used to determine the topological 
arrangement of the protein in the cell membrane. 

10 This provides information on segments of the 

protein which are accessible to externally added 
modulating agents for purposes of drug therapy. 

3. The structure- function relationships of 
portions of the protein can be examined using 

15 specific antibodies. For example, it is possible 

to introduce into cells antibodies recognizing each 
of the charged cytoplasmic loops which join the 
transmembrane sequences as well as portions of the 
nucleotide binding folds and the R-domain. The 

20 influence of these antibodies on functional 

parameters of the protein provide insight into cell 
regulatory mechanisms and potentially suggest means 
of modulating the activity of the defective protein 
in a CF patient. 

25 4. Antibodies with the appropriate avidity also 

enable immunoprecipitation and immuno-af f inity 
purification of the protein. Immunoprecipitation 
will facilitate characterization of synthesis and 
post translational modification including ATP 

30 binding and phosphorylation. Purification will be 

required for studies of protein structure and for 
reconstitution of its function, as well as protein 
based therapy. 

In order to prepare the antibodies, fusion proteins 
35 containing defined portions of CFTR polypeptides have 
been synthesized in bacteria by expression of 



WO 91/02796 



87 



PCI7CA90/00267 



corresponding DNA sequences in a suitable cloning 
vehicle whereas smaller peptides were synthesized 
chemically as described in Table 8. The fusion proteins 
were purified, for example, by affinity chromatography 
5 on glutathione-agarose and the peptides were coupled to 
a carrier protein (hemocyanin) , mixed with Freund's 
adjuvant and injected into rabbits. Following booster 
injections at bi-weekly intervals, the rabbits were bled 
and sera isolated. The stained fusion proteins are 

10 shown in Figures 19a. Lane 1, uninduced control 

plasmid; lane 2, IPTG-induced control plasmid expressing 
just glutathione-s-transf erase (GST); lane 3, affinity 
purified GST band at 27 kilodaltons (kD) ; lane 4 is 
uninduced, lane 5 is induced and lane 6 is the purified 

15 fusion protein #1 of Table 8. In Figure 19b, the gel 
electrophoresis is of lysates from bacteria transformed 
with pGEX plasmids containing fusion proteins #5 of 
Table 8 for lanes 1 and 2 and fusion proteins #2 of 
Table 8 for lanes 3 and 4. Lane 1 of Figure 19b is for 

20 the uninduced plasmid whereas lane 2 is for the induced 
plasmid to express the fusion protein #5. Lane 3 of 
Figure 19b is for the uninduced plasmid whereas lane 4 
is for the induced plasmid to express the fusion protein 
#2. Immunoblots of fusion protein #1 probed with 

25 antisera obtained from the second bleeds of two,; 

different rabbits are shown in Figure 20. The staining 
is with alkaline-phosphatase conjugated second antibody 
[Blake et al, Anal. Biochem . 136:175, (1984)]. Both of 
these immune sera stain the 32 kD fusion protein whereas 

30 the preimmune sera do not. Figure 21 shows the 

reactivity of one of these immune sera with a band of 
approximately 200 kD in size in membranes isolated from 
T-84 colonic carcinoma cells which express the CFTR 
transcript at a high level. This band is in tt^e size 

35 range which might be expected for the CFTR protein which 
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has a predicted molecular weight of 169 kD prior to 
post-translational modifications. 

Sera from rabbits immunized with the LKH conjugate 
of peptide #2 were screened again both pure peptide and 
5 KLH as shown in Figure 22. In this Figure, H denotes 
hemocyanin; PI, peptide #l; P2, peptide #2. Amounts of 
protein or peptide dotted in ng are indicated. This 
antiserum detects as little as 1 ng of the peptide and 
does not react at all with control peptide #1. 
10 Thus, it is possible to raise polyclonal antibodies 

specific for both fusion proteins containing portions of 
the CFTR protein and peptides corresponding to short 
segments of its sequence. Similarly, mice can be 
injected with Km conjugates of peptides 1, 2 and 7 of 
15 Table 8 to initiate the production of monoclonal 
antibodies to these segments of CFTR protein. 
Monoclonal antibodies can be similarly raised to other 
domains of the CFTR protein. 

As for the generation of polyclonal antibodies, 
20 immunogens for the raising of monoclonal antibodies 

(mAbs) to the CFTR protein are bacterial fusion proteins 
[Smith et al, G2D& 67:31 (1988)] containing portions of 
the CFTR polypeptide or synthetic peptides 
corresponding to short (12 to 25 amino acids in length) 
25 segments of the sequence. The essential methodology is 
that of Kohler and Milstein [Hafcirs 256: 495 (1975)]. 

Balb/c mice are immunized by intraperitoneal 
injection with 500 nq of pure fusion protein or 
synthetic peptide in incomplete Freund's adjuvant. A 
30 second injection is given after 14 days, a third after 
21 days and a fourth after 28 days. Individual animals 
so immunized are sacrificed one, two and four weeks 
following the final injection. Spleens are removed, 
their cells dissociated, collected and fused with Ip 2/0- 
35 Agl4 myeloma cells according to Gefter et al, Somatic 
Cell <?<?netics 3:231 (1977). The fusion mixture is 
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distributed in culture medium selective for the 
propagation of fused cells which are grown until they 
are about 25% confluent. At this time, culture 
supernatants are tested for the presence of antibodies 
5 reacting with a particular CFTR antigen. An alkaline 
phosphatase labelled anti-mouse second antibody is then, 
used for detection of positives. Cells from positive 
culture wells are then expanded in culture, their 
supernatants collected for further testing and the cells 
10 stored deep frozen in cryoprotectant-containing medium. 
To obtain large quantities of a mAb, producer cells are 
injected into the peritoneum at 5 x 10 6 cells per 
animal, and ascites fluid is obtained. Purification is 
by chromotography on Protein G- or Protein A-agarose 
15 according to Ey etal, ImmunochemiBt-ry 15:429 (1977). 

Reactivity, of these mAbs with the CFTR protein is 
confirmed by polyacrylamide gel electrophoresis of 
membranes isolated from epithelial cells in which it is 
expressed and immunoblotting [Towbin et al, Proc. Natl. 
20 Acad. Sci. USA 76:4350 (1979)]. 

In addition to the use of monoclonal antibodies 
specific for each of the different domains of the CFTR 
protein to probe their individual functions, other 
mAbs, which can distinguish between the normal and 
25 mutant forms of CFTR protein, are used to detect the 

mutant protein in epithelial cell samples obtained from 
patients, such as nasal mucosa biopsy "brushings" [ R. 
De-Lough and J. Rutland, J. Clin. Pathol. 42, 613 
(1989)] or skin biopsy specimens containing sweat 
30 glands. 

Antibodies capable of this distinction are obtained 
by differentially screening hybridomas from paired sets 
of mice immunized with a peptide containing the 
phenylalanine at amino acid position 508 (e.g. 
35 GTIKENI IJ.GVS Y) or a peptide which is identical except 
for the absence of F508 (GTIKENIIGVSY) . mAbs capable 
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of recognizing the other mutant forms of CFTR protein 
present in patients in addition or instead of F508 
deletion are obtained using similar monoclonal antibody 
production strategies. 
5 Antibodies to normal and CF versions of CFTR 

protein and of segments thereof are used in 
diagnostically immunocytochemical and immunofluorescence 
light microscopy and immunoelectron microscopy to 
demonstrate the tissue, cellular and subcellular 

10 distribution of CFTR within the organs of CF patients, 
carriers and non-CF individuals. 

Antibodies are used to therapeutically modulate by 
promoting the activity of the CFTR protein in CF 
patients and in cells of CF patients. Possible modes of 

15 such modulation might involve stimulation due to cross- 
linking of CFTR protein molecules with multivalent 
antibodies in analogy with stimulation of some cell 
surface membrane receptors, such as the insulin receptor 
[O'Brien et al, Euro. Mol. Biol. Organ. J. 6:4003 

20 (1987)], epidermal growth factor receptor [Schreiber et 
al, J. Biol. Chem. 258:846 (1983)] and T-cell receptor- 
associated molecules such as CD4 [Veillette et al 
Nature , 338:257 (1989)]. 

Antibodies are used to direct the delivery of 

25 therapeutic agents to the cells which express defective 
CFTR protein in CF. For this purpose, the antibodies 
are incorporated into a vehicle such as a liposome 
[Katthay et al, Cancer Res . 46:4904 (1986)] which 
carries the therapeutic agent such as a drug or the 

30 normal gene. 
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TABLE 8 

CFTR FRAGMENTS USED TO RAISE ANTIBODIES 

GST a fusion proteins CFTR Domain of Fig. 13 
5 containing CFTR residues 

1. 204-249 TM3, Ext. 2, TMA 

2. 347-698 NBF-1 , N-term 1/2 R-domain 

3. 710-757 Neg. charged middle of R-domain 
10 4. 758-796 Pos. charged segment of R-domain 

5. 1188-1480 C-term. cyto. domain with NBF-2 

KLH b conjugates 
containing CFTR peptides: 

15 



20 



1. 


28-45 


N-term. cytoplasmic 


2. 


58-75 


N-term. cytoplasmic 


3. 


104-117 


1st extracellular 


4. 


139-153 


2nd cytoplasmic 


5. 


279-294 


N-tena. of 3rd cytoplasmic 


6. 


500-512 


NBF-1; around the F508 deletion 


7. 


725-739 


Charged middle of R-domain 


8. 


933-946 


5th cytoplasmic 


9. 


1066-1084 


6th cytoplasmic 



25 a. restriction fragments coding for these fragments 
ligated to 3' end of glutathione S-transf erase 
(GST) of Schistosoma iaponicum in pGEX plasmid 
expression vector as identified in Smith et al, 
Gene 67:31, (1988) . 

30 b. Peptides coupled through an N-terminal cysteine to 
the carrier protein keyhole limpet hemocyanin (KLH) 
according to Green et al Cell 28:477 (1982). TM 
denotes transmembrane sequences. 
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*VTJ> ANALYSIS 

This invention provides a number of benefits 
steaming directly from the discovery and 
characterization of the CF gene which are of immediate 
5 practical application. The amino acid sequence of CFTR 
provides insight into the structure and function of the 
protein as well as the molecular mechanisms in which 
CFTR participates and which are defective in cystic 
fibrosis. This information enables the generation of 
10 further tools and concepts , in research on and therapy 
for this disease. 

Carrier detection, DNA diagnosis and family 
counselling are some of the applications of the 
invention. Previously DNA-based genetic testing for CF 
15 has primarly been available to families with affected 

children and to their close relatives. Knowledge of the 
CF mutations at the DNA sequence level permits testing 
of any random individual; our estimate shows that 46* of 
CF patients without a previous family history can be 
20 accurately diagnosed by DNA analysis, and 68* of the CF 
carriers in the population can be identified via the 
F508 deletion. 

Given that the carrier frequency in the North 
American population is approximately 1 in 20, it is 
25 feasible to screen all women and/ or men of child-bearing 
age, for example, for their carrier status. .* carrier 
detection using probes specific for the F508 deletion 
will pick up 70* of the carriers. The remaining 
carriers will be detected by a battery of probes 
30 specific for the various haplotype groups identified 
above. 

Since the F508 deletion constitutes about 70* of 
all CF mutations, RFLP analysis may be used in 
supplement to the direct deletion testing for family 
35 members or close relative of CF patients. About 55* of 
the CF parents not carrying the F508 mutation are 
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expected to be informative for the DNA marker JG2E1 
(KH19) [Kerem et al Am. J. Hum . Genet 44 927-834 (1989); 
Estivill et al, Genomics 1:257 (1987)] based on 
retrospective analysis of our CF linkage families; an 
5 additional 39% would be informative if E6 (Taq I) [Kerem 
et al supra 1 and J3.ll (Msp I) [Wainright et al Nature 
(1985)] were also tested; virtually all parents would be - 
informative if H2.3 (XV2C-Taq I) [Kerem et al, supra; 
Estivill et al, HaiUXfi (1987)], E2.6 (E.9) (Msp I) 

10 [probe available on request], E4.1 (Mp6d.9) (Msp I) 

[probe available upon request; Estivill et al, Am. J. 
Hum. Genet. (1989)], J44 (E3.1) (Xba I) [probe available 
on request] and metD (Ban I) [Spence et al, Am. J. Hum. 
Genet (1986) , [ATCC #40219] were included. 

15 The utility of these probes lies in the fact that 

they recognize polymorphic restriction sites. Thus, 
the probes are typically not defined by their sequence 
across the particular polymorphic site, but rather, can 
be utilized based on knowledge of flanking sequences, 

20 allowing for polymerase chain reaction (PCR) generation 
of the region in question, as would be known by one 
skilled in the art. 

For example, the probe E2.6 (Msp I) is completely 
defined by two flanking oligomers: 

25 5 'GTGATCCAGTTTGCTCTCCA3 ' , and 5 'GGAATCACTCTTCCTGATAT3 ' . 
Use of this E2.6 PCR generated probe to detect an Msp I 
polymorphism will detect two different alleles: either 
one 850 bp fragment, or a 490 bp and a 360 bp fragment, 
depending on the presence or absence of the Msp I site. 

30 Similarly, the probe J44 (E3.1) (Xba I) is completely 
defined by two flanking oligomers: 
5 ' CAATGTGATTGGTGAAACTA3 ' , and 

5 ' CTTCTCCTCCTAGACACCTGCAT3 ' . Use of this J44 (E3.1) 
PCR generated probe to detect an Xba I polymorphism will 
35 detect two different alleles: either an 860 bp fragment 
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or a 610 bp and a 250 bp fragment, depending on the 
presence or absence of the Xba I site. 

The linked RFLPs may also be used in risk 
calculation for individuals who do hot carry the F5 08 
5 deletion. A general risk estimate procedure has been 
discussed in Beaudet et al Am. J. Hum. Genet 44:319- 
326]. 

For prenatal diagnosis , microvillar intestinal 
enzyme analysis (Brock/ Lancet 2: 941 (1983)) may be 
10 performed to increase the confidence of diagnosis in 
cases where DNA diagnosis is inconclusive. 

DNA diagnosis is currently being used to assess 
whether a fetus will be born with cystic fibrosis/ but 
historically this has only been done after a particular 

15 set of parents has* already had one cystic fibrosis child 
which identifies them as obligate carriers. However, in 
combination with carrier detection as outlined above, 
DNA diagnosis for all pregnancies of carrier couples 
will be possible. If the parents have already had a 

20 cystic fibrosis child, an extended haplotype analysis 
can be done on the fetus and thus the percentage of 
false positive or false negative will be greatly 
reduced. If the parents have not already had an 
affected child and the DNA diagnosis on the fetus is 

25 being performed on the basis of carrier detection," 
haplotype analysis can still be performed. ; 

Although it has been thought for many years that 
there is a great deal of clinical heterogeneity in the 
cystic fibrosis disease, it is now emerging that there 

30 are two general categories, called pancreatic 

sufficiency (CF-PS) and pancreatic insufficiency (CF- 
PI) . If the mutations related to these disease 
categories are well characterized, one can associate a 
particular mutation with a clinical phenotype of the 

35 disease. This allows changes in the treatment of each 
patient. Thus the nature of the mutation will to a 
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certain extent predict the prognosis of the patient and 
indicate a specific treatment. 

&&£ MOKECPIAR BIOI^GY O P CYSTTC yTyfl flffi 

The postulate that CFTR may regulate the activity 
5 of ion channels, particularly the outwardly rectifying 
Cl channel implicated as the functional defect in CF, 
can be tested by the injection and translation of full 
length in vitro transcribed CFTR mRNA in Xenopus 
oocytes. The ensuing changes in ion currents across the 
10 oocyte membrane can be measured as the potential is 

clamped at a fixed value. CFTR may regulate endogenous 
oocyte channels or it may be necessary to also introduce 
epithelial cell RNA to direct the translation of channel 
proteins. Use of mRNA coding for normal and for mutant 
15 CFTR, as provided by this invention, makes these 
experiments possible. 

Other modes of expression in heterologous cell 
system also facilitate dissection of structure- function 
relationships. The complete CFTR DNA sequence ligated 
20 into a plasmid expression vector is used to transfect 
cells so that its influence on ion transport can be 
assessed. Plasmid expression vectors containing part of 
the normal CFTR sequence along with portions of modified 
sequence at selected sites can be used in vitro 
25 mutagenesis experiments performed in order to identify 
those portions of the CFTR protein which are! crucial for 
regulatory function. 

4tl EXPRESSIO N OF DNA SgOirewq K 

The DNA sequence can be manipulated in studies to 
30 understand the expression of the gene and its product, 
and, to achieve production of large quantities of the 
protein for functional analysis, antibody production, 
and patient therapy. The changes in the sequence may or 
may not alter the expression pattern in terms of, 
35 relative quantities, tissue-specificity and functional 
properties. The partial or full-length cDNA sequences, 
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which encode for the subject protein, unmodified or 
modified, may be ligated to bacterial expression vectors 
such as the pRIT (Nilsson et al. SM BO J. 4: 1075-1080 
(1985)), pGEX (Smith and Johnson, Gene 67: 31-40 
5 (1988)) or pATH (Spindler et al. J. viral. 49: 132-141 
(1984)) plasmids which can be introduced into £. coli 
cells for production of the corresponding proteins which 
may be isolated in accordance with the previously 
discussed protein purification procedures. The DNA 
10 sequence can also be transferred from its existing 
context to other cloning vehicles, such as other 
plasmids, bacteriophages, cosmids, animal virus, yeast 
artificial chromosomes (YAC) (Burke et al. Science 236: 
806-812, (1987)), somatic cells, and other simple or 
15 complex organisms, such as bacteria, fungi (Timberlake 
and Marshall, Science 1313-1317 (1989), 

invertebrates, plants (Gasser and Praley, Science 244: 
1293 (1989), and pigs (Pursel et al. Science 244: 1281- 
1288 (1989)). 

20 For expression in mammalian cells, the cDNA 

sequence may be ligated to heterologous promoters, such 
as the simian virus (SV) 40, promoter in the pSV2 vector 
[Mulligan and Berg, Proc. wufci . Acad, ftni TM n , 78:2072- 
2076 (1981)] and introduced into cells, such as monkey 
25 COS-1 cells [Gluzman, Qsll, 23:175-182 (1981)], to 

achieve transient or long-term expression. The stable 
integration of the chimeric gene construct may be 
maintained in mammalian cells by biochemical selection, 
such as neomycin [Southern and Berg, J. Moi. Appj n , 
30 Genet t 1:327-341 (1982)] and mycophoenolic acid 
[Mulligan and Berg, supra 1 . 

DNA sequences can be manipulated with standard 
procedures such as restriction enzyme digestion, fill-in 
with DNA polymerase, deletion by exonuclease, extension 
35 by terminal deoxynucleotide transferase, ligation of 
synthetic or cloned DNA sequences, site-directed 
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sequence-alteration via single-stranded bacteriophage 
intermediate or with the use of specific 
oligonucleotides in combination with PCR. 

The cDNA sequence (or portions derived from it) , or 
5 a mini gene (a cDNA with an intron and its own promoter) 
is introduced into eukaryotic expression vectors by 
conventional techniques. These vectors are designed to 
permit the transcription of the cDNA in eukaryotic cells 
by providing regulatory sequences that initiate and 
10 enhance the transcription of the cDNA and ensure its 

proper splicing and polyadenylation. Vectors containing 
the promoter and enhancer regions of the simian virus 
(SV)40 or long terminal repeat (LTR) of the Rous Sarcoma 
virus and polyadenylation and splicing signal from SV 
15 40 are readily available [Mulligan et al Proc. . 

Acad, sci, TTSft 78:1078-2076, (1981); Gorman et al Proc 
W*tlt Acad/, Scj VSft 79: 6777-6781 (1982)]. 
Alternatively, the CFTR endogenous promoter may be used. 
The level of expression of the cDNA can be manipulated 
20 with this type of vector, either by using promoters that 
have different activities (for example, the baculovirus 
PAC373 can express cDNAs at high levels in ^ 
frunqiperfa cells [M." D. Summers and G. E. Smith in, 
Genetically Altered Viruses and the Environment ;(B. 
25 Fields, et al, eds.) vol. 22 no 319-328, Cold Spring 

Harbour Laboratory Press, Cold Spring Harbour, New York, 
1985] or by using vectors that contain promoters 
amenable to modulation, for example the glucocorticoid- 
responsive promoter from the mouse mammary tumor virus 
30 [Lee et al, UaXur& 294:228 (1982)]. The expression of 
the cDNA can be monitored in the recipient cells 24 to 
72 hours after introduction (transient expression) . 

In addition, some vectors contain selectable 
markers [such as the gpjfc [Mulligan et Berg supra] or bsq. 
35 [Southern and Berg J. Mol. Aooln. G*n«* 1:327-341 

(1982)] bacterial genes that permit isolation of cells, 
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by chemical selection, that have stable, long term 
expression of the vectors (and therefore the cDNA) in 
the recipient cell. The vectors can be maintained in 
the cells as episomal, freely replicating entities by 
5 using regulatory elements of viruses such as papilloma 
[Sarver et al Mol. Cell Biol. 1:486 (1981)] or Epstein- 
Barr (Sugden et al Mol. Cell Biol. 5;Ain (1985)]. 
Alternatively, one can also produce cell lines that have 
integrated the vector into genomic DNA. Both of these 
10 types of cell lines produce the gene product on a 

continuous basis. One can also produce cell lines that 
have amplified the number of copies of the vector (and 
therefore of the cDNA as well) to create cell lines that 
can produce high levels of the gene product [Alt et al. 
15 J. Biol. Chem. 253: 1357 (1978)]. 

The transfer of DNA into eukaryotic, in particular 
human or other mammalian cells is now a conventional 
technique. The vectors are introduced into the 
recipient cells as pure DNA (trans fection) by, for 
20 example, precipitation with calcium phosphate [Graham 
and vander Eb, Virology 52:466 (1973) or strontium 
phosphate [Brash et al Mol. Cell Biol. 7:2013 (1987)], 
electroporation [Neumann et al EMBO J 1:841 (1982)], 
lipofection [Feigner et al Proc Natl. Acad. Sci USA 
25 84:7413 (1987)], DEAE dextran [McCuthan et al J. Natl 
Cancer Inst. 41:351 1968)], microinjection [Mueller et 
al SMll 15:579 1978)], protoplast fusion [Schafner, Proc 
Natl. Aea. Sci USA 72:2163] or pellet guns [Klein et al, 
HlJairji 327: 70 (1987)]. Alternatively, the cDNA can be 
30 introduced by infection with virus vectors. Systems are 
developed that use, for example, retroviruses [Bernstein 
et al. Genetic Engineering 7: 235, (1985)], adenoviruses 
[Ahmad et al J. Virol 57:267 (1986)] or Herpes virus 
[Spaete et al Cell 30:295 (1982)]. 
35 These eukaryotic expression systems can be used for 

many studies of the CF gene and the CFTR product. These 
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include , for example: (1) determination that the gene is 
properly expressed and that all post-translational 
modifications necessary for full biological activity 
have been properly completed (2) identify regulatory 
5 elements located in the 5' region of the CF gene and 
their role in the tissue- or temporal-regulation of the 
expression of the CF gene (3) production of large 
amounts of the normal protein for isolation and 
purification (4) to use cells expressing the CFTR 

10 protein as an assay system for antibodies generated 

against the CFTR protein or an assay system to test the 
effectiveness of drugs, (5) study the function of the 
normal complete protein, specific portions of the 
protein, or of naturally occurring or artificially 

15 produced mutant proteins. Naturally occurring mutant 
proteins exist in patients with CF while artificially 
produced mutant protein can be designed by site directed 
sequence alterations. These latter studies can probe 
the function of any desired amino acid residue in the 

20 . protein by mutating the nucleotides coding for that 
amino acid. 

Using the above techniques, the expression vectors 
containing the CF gene sequence or fragments thereof 
can be introduced into human cells, mammalian cells from 

25 other species or non-mammalian cells as desired. The 
choice of cell is determined by the purpose of the 
treatment. For example, one can use monkey COS cells 
[Gluzman, Cell 23:175 (1981)], that produce high levels 
of the SV40 T antigen and permit the replication of 

30 vectors containing the SV40 origin of replication, can 
be used to show that the vector can express the protein 
product, since function is not required. Similar 
treatment could be performed with Chinese hamster ovary 
(CHO) or mouse NIH 3T3 fibroblasts or with human 

35 fibroblasts or lymphoblasts . 
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The recombinant cloning vector, according to this 
invention, then comprises the selected DNA of the DNA 
sequences of this invention for expression in a suitable 
host. The DNA is operatively linked in the vector to an 
5 expression control sequence in the recombinant DNA 
molecule so that normal CFTR polypeptide can be 
expressed. The expression control sequence may be 
selected from the group consisting of sequences that 
control the expression of genes of prokaryotic or 
10 eukaryotic cells and their viruses and combinations 
thereof. The expression control sequence may be 
specifically selected from the group consisting of the 
las system, the £rj2 system, the £as system, the trc 
system, major operator and promoter regions of phage 
15 lambda, the control region of fd coat protein, the early 
and late promoters of SV40, promoters derived from 
polyoma, adenovirus, retrovirus, baculovirus and simian 
virus, the promoter for 3-phosphoglycerate kinase, the 
promoters of yeast acid phosphatase, the promoter of the 
20 yeast alpha-mating factors and combinations thereof. 

The host cell, which may be transfected with the 
vector of this invention, may be selected from the group 
consisting of Zj. S3ll, Pgeudomonas, Bacillus subtil ia . 
BaciU"g stearothennophiiua or other bacili; other 
25 bacteria; yeast; fungi; insect; mouse or other animal; 
or plant hosts; or human tissue cells. 

It is appreciated that for the mutant DNA sequence 
similar systems are employed to express and produce the 
mutant product. 
30 £*2 PROTEIN FUNCTTOK eowsiDERATTOWg 

To study the function of the CFTR protein, it is 
preferable to use epithelial cells as recipients, since 
proper functional expression may require the presence of 
other pathways or gene products that are only expressed 
35 in such cells. Cells that can be used include, for 

example, human epithelial cell lines such as T84 (ATCC 
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#CRL 248) or PANC-1 (ATCC # CLL 1469), or the T43 
immortalized CF nasal epithelium cell line [Jettan et 
al, Science (1989)] and primary [Yjtjihoskes et al. Ann. 
Rev. Reso. Pis. 132: 1281 (1985)] or transformed 
5 [Scholte et al. Exp. Cell. Res. 182: 559(1989)] human 
nasal polyp or airways cells, pancreatic cells [Harris ~ 
and Coleman J. Cell. Sci. 87: 695 (1987)], or sweat 
gland cells [Collie et al. In Vitro 21: 597 (1985)] 
derived from normal or CF subjects. The CF cells can be 

10 used to test for the functional activity of mutant CF 
genes. Current functional assays available include the 
study of the movement of anions (CI or I) across cell 
membranes as a function of stimulation of cells by 
agents that raise intracellular AMP levels and activate 

15 chloride channels [Stutto et al. Proc. Nat. Acad. Set. 
U» S. A. 82: 6677 (1985)]. Other assays include the 
measurement of changes in cellular potentials by patch 
clamping of whole cells or of isolated membranes 
[Frizzell et al. Science 233: 558 (1986), Welsch and 

20 Liedtke Nature 322: 467 (1986) ]or the study of ion 
fluxes in epithelial sheets of confluent cells 
[Widdicombe et al. Proc. Nat. Acad. Set, 82: 6167 
(1985)]. Alternatively, RNA made from the CF gene could 
be injected into Xenopus oocytes. The oocyte will 

25 translate RNA into protein and allow its study. As 

other more specific assays are developed these can also 
be used in the study of transfected CFTR protein 
function. 

"Domain-switching" experiments between CFTR and 
30 the human multidrug resistance P-glycoprotein can also 
be performed to further the study of the CFTR protein. 
In these experiments, plasmid expression vectors are 
constructed by routine techniques from fragments of the 
CFTR sequence and fragments of the sequence of P- 
35 glycoprotein ligated together by DNA ligase so that a 
protein containing the respective portions of these two 
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proteins will be synthesized by a host cell transfected 
with the plasmid. The latter approach has the advantage 
that many experimental parameters, associated with 
multidrug resistance can be measured. Hence , it is now 
5 possible to assess the ability of segments of CFTR to 
influence these parameters. 

These studies of the influence of CFTR on ion 
transport will serve to bring the field of epithelial 
transport into the molecular arena. This is the first 
10 transport related molecule from epithelial cells for 
which the complete primary structure is shown. 
Knowledge of CFTR can be used to better understand at a 
molecular level the characteristics of the epithelial 
cell membrane in this area. For example , the molecules 
15 in closest proximity to CFTR can be determined by cross- 
linking experiments. The hypothesis that the role of 
CFTR is to regulate ion channels would predict that 
these channels would necessarily fall into that 
category. The large, high quality cDNA libraries 
20 constructed for the cloning of CFTR cDNAs will also be 
useful for the molecular cloning of cDNAs for 
polypeptides constituting other epithelial ion transport 
systems, including other channels as well as co-, 
counter-, and active-transport systems. 
25 THERAPIES 

It is understood that the major aim of -the various 
biochemical studies using the compositions of this 
invention is the development of therapies to circumvent 
or overcome the CF defect, using both the 
30 pharmacological and the "gene-therapy" approaches. 

In the pharmacological approach, drugs which 
circumvent or overcome the CF defect are sought. 
Initially, compounds may be tested essentially at 
random, and screening systems are required to 
35 discriminate among many candidate compounds. This 

invention provides host cell systems, expressing various 
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of the mutant CF genes, which are particularly well 
suited for use as first level screening systems. 
Preferably, a cell culture system using mammalian cells 
(most preferably human cells) trans feet ed with an 
5 expression vector comprising a DNA sequence coding for 
CFTR protein containing a CF-generating mutation, for 
example the F508 deletion, is used in the screening 
process. Candidate drugs are tested by incubating the 
cells in the presence of the candidate drug and 

10 measuring those cellular functions dependent on CFTR, 
especially by measuring ion currents where the 
transmembrane potential is clamped at a fixed value. To 
accommodate the large number of assays, however, more 
convenient assays are based, for example, on the use of 

15 ion-sensitive fluorescent dyes, To detect changes in 
Cl~ion concentration SPQ or its analogues are useful. 

Alternatively, a cell-free system could be used. 
Purified CFTR could be reconstituted into articifial 
membranes and drugs could be screened in a cell-free 

20 assay [Al-Aqwatt, Science . (1989)]. 

At the second level, animal testing is required. 
It is possible to develop a model of CF by interfering 
with the normal expression of the counterpart of the CF 
gene in an animal such as the mouse. The "knock-out" of 

25 this gene by introducing a mutant form of it into the 
germ line of animals will provide a strain of animals 
with CF-like syndromes. This enables testing of drugs 
which showed a promise in the first level cell-based 
screen. 

30 As further knowledge is gained about the nature of 

the protein and its function, it will be possible to 
predict structures of proteins or other compounds that 
interact with the CFTR protein • That in turn will allow 
for certain predictions to be made about potential drugs 

35 that will interact with this protein and have some 

effect on the treatment of the patients. Ultimately 
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such drugs may be designed and synthesized chemically on 
the basis of structures predicted to be required to 
interact with domains of CFTR. This approach is 
reviewed in Capsey and Delvatte, Genetically Engineered 
5 Human Therapeutic Drugs Stockton Press, New York, 1988. 
These potential drugs must also be tested in the 
screening system. 

St3«l PROTEIN REPLACEMENT THTO1PV 

Treatment of CF can be performed by replacing the 
10 defective protein with normal protein, by modulating the 
function of the defective protein or by modifying 
another step in the pathway in which CFTR participates 
in order to correct the physiological abnormality. 

To be able to replace the defective protein with 
15 the normal version', one must have reasonably large 
amounts of pure CFTR protein. Pure protein can be 
obtained as described earlier from cultured cell 
systems. Delivery of the protein to the affected 
airways tissue will require its packaging in iipid- 
20 containing vesicles that facilitate the incorporation of 
the protein into the cell membrane. It may also be 
feasible to use vehicles that incorporate proteins such 
as surfactant protein, such as SAP(Val) or SAP(Phe) that 
performs this function naturally, at least for lung 
25 alveolar cells. (PCT Patent Application WO/8803170, 

Whitsett et al, May 7, 1988 and PCT Patent Application 
WO89/04327, Benson et al, May 18, 1989). The CFTR- 
containing vesicles are introduced into the airways by 
inhalation or irrigation, techniques that are currently 
30 used in CF treatment (Boat et al, supra \ . 
<?t3 T 2 PRPG THERAPY 

Modulation of CFTR function can be accomplished by 
the use of therapeutic agents (drugs) . These can be 
identified by random approaches using a screening 
35 program in which their effectiveness in modulating the 
defective CFTR protein is monitored in vitro . Screening 
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programs can use cultured cell systems in which the 
defective CFTR protein is expressed. Alternatively, 
drugs can be designed to modulate .CFTR activity from 
knowledge of the structure and function correlations of 
5 CFTR protein and from knowledge of the specific defect 
in the various CFTR mutant proteins (Capsey and 
Delvatte, SUEZA) . It is possible that each mutant CFTR 
protein will require a different drug for specific 
modulation. It will then be necessary to identify the" 
10 specific mutation (s) in each CF patient before 
initiating drug therapy. 

Drugs can be designed to interact with different 
aspects of CFTR protein structure or function. For 
example, a drug (or antibody) can bind to a structural 
15 fold of the protein to correct a defective structure. 
Alternatively, a drug might bind to a specific 
functional residue and increase its affinity for a 
substrate or cof actor. since it is known that members 
of the class of proteins to which CFTR has structural 
20 homology can interact, bind and transport a variety of 
drugs, it is reasonable to expect that drug-related 
therapies may be effective in treatment of CF. 

A third mechanism for enhancing the activity of an 
effective drug would be to modulate the production or 
25 the stability of CFTR inside the cell. This increase 

in the amount of CFTR could compensate for its defective 
function. 

.Drug therapy can also be used to compensate for the 
defective CFTR function by interactions with other 

30 components of the physiological or biochemical pathway 
necessary for the expression of the CFTR function. 
These interactions can lead to increases or decreases in 
the activity of these ancillary proteins. The methods 
for the identification of these drugs would be ^similar 

35 to those described above for CFTR-related drugs. 
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In other genetic disorders, it has been possible to 
correct for the consequences of altered or missing 
normal functions by use of dietary modifications* This 
has taken the form of removal of metabolites, as in the 
5 case of phenylketonuria, where phenylalanine is removed 
from the diet in the first five years of life to prevent 
mental retardation, or by the addition of large amounts 
of metabolites to the diet, as in the case of adenosime 
deaminase deficiency where the functional correction of 

10 the activity of the enzyme can be produced by the 
addition of the enzyme to the diet* Thus, once the 
details of the CFTR function have been elucidated and 
the basic defect in CF has been defined, therapy may be 
achieved by dietary manipulations. 

15 The second potential therapeutic approach is so- 

called "gene-therapy" in which normal copies of the CF 
gene are introduced in to patients so as to successfully 
code for normal protein in the key epithelial cells of 
affected tissues. It is most crucial to attempt to 

20 achieve this with the airway epithelial cells of the 
respiratory tract. The CF gene is delivered to these 
cells in form in which it can be taken up and code for 
sufficient protein to provide regulatory function. As a 
result, the patient's quality and length of life will be 

25 greatly extended. Ultimately, of course, the aim is to 
deliver the gene to all affected tissues. 

S.3«3 SENS TBERAHC 

One approach to therapy of CF is to insert a normal 
version of the CF gene into the airway epithelium of 

30 affected patients. It is important to note that the 

respiratory system is the primary cause of mordibity and 
mortality in CF; while pancreatic disease is a major 
feature, it is relatively well treated today with enzyme 
supplementation. Thus, somatic cell gene therapy [for a 

35 review, see T. Friedmann, Science 244:1275 (1989)] 
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targeting the airway would alleviate the most severe 
problems associated with CF. 

A. Retroviral Vectors . Retroviruses have been 
considered the preferred vector for experiments in 

5 somatic gene therapy, with a high efficiency of 

infection and stable integration and expression [Orkin 
et al Proa. Med. Genet 7:130, (1988)]. A possible 
drawback is that cell division is necessary for 
retroviral integration, so that the targeted cells in 

10 the airway may have to be nudged into the cell cycle 
prior to retroviral infection, perhaps by chemical 
means. The full length CF gene cDNA can be cloned into 
a retroviral vector and driven from either its 
endogenous promoter or from the retroviral LRT (long 

15 terminal repeat) . Expression of levels of the normal 
protein as low as 10% of the endogenous mutant protein 
in CF patients would be expected to be beneficial, since 
this is a recessive disease. Delivery of the. virus 
could be accomplished by aerosol or instillation into 

20 the trachea. 

B. Other Viral Vectors . Other delivery systems 
which can be utilized include adeno-associated virus 
[AAV, McLaughlin et al, J. Virol 62:1963 (1988)], 
vaccinia virus [Moss et al AnnUt R*Y T IaaunPl , 5:305, 

25 1987)], bovine papilloma virus [Rasmussen et al, Methods 
Enzvmol 139:642 (1987)] or member of the herpesvirus 
group such as Epstein-Barr virus (Margolskee et al Mol. 
cell. Biol 8:2937 (1988)]. Though much would need to 
be learned about their basic biology, the idea of using 

30 a viral vector with natural tropism for the respiratory 
tree (e.g. respiratory syncytial virus, echovirus, 
Coxsackie virus, etc.) is possible. 

C. Non-viral Gene Transfer . Other methods of 
inserting the CF gene into respiratory epithelium may 

35 also be productive; many of these are lower efficiency 
and would potentially require infection in vitro . 
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selection of trans fectants, and reimplantation. This 
vould include calcium phosphate, DEAE dextran, 
electroporation, and protoplast fusion. A particularly 
attractive idea is the use of liposome, which might be 
5 possible to carry out In [0 stro, Ueo^ojss., Marcel- 

Dekker, 1987]. Synthetic cationic lipids such as DOTKA 
[Felger et al Prog. Natl. ftcad.sM np ft 84:7413 (1987)] 

»ay increase the efficiency and ease of carrying but 

this approach. 

The creation of a mouse or other animal model for 
CF will be crucial to understanding the disease and for 
testing of possible therapies (for general review of 
creating animal models, see Erickson, Am. J. h™..^ ^ 
43:582 (1988)]. currently no animal model of the CF 
exists. The evolutionary conservation of the CF gene 
(as demonstrated by the cross-species hybridization 
blots for E4.3 and HI. 6), as is shown in Figure 4, 
indicate that an orthologous gene exists in the mouse 
(hereafter to be denoted mCF, and its corresponding 
protein as mCFTR) , and this will be possible to clone in 
mouse genomic and cDNA libraries using the human CF gene 
probes, it is expected that the generation of a 
specific mutation in the mouse gene analogous to the 
F508 mutation will be most optimum to reproduce the 
phenotype, though complete inactivation of the mCFTR 
gene will also be a useful mutant to generate. 

A. Mutagenesis , , Inactivation of the mCF gene can 
b* achieved by chemical [e.g. Johnson et al Proc. wti T 
Agaflt gg1, Tjffft 78:3138 (1981)] or X-ray mutagenesis 
[Popp et al J t Hoi, pjo], 127:141 (1979)] of mouse 
gametes, followed by fertilization. Offspring 
heterozygous for inactivation of mCFTR can then be 
identified by Southern blotting to demonstrate loss' of 
one allele by dosage, or failure to inherit one parental 
allele if an RFLP marker is being assessed. This 
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approach has previously been successfully used to 
identify mouse mutants for a-globin [Whitney et al Proc, 
trntl. Acad, sci. USA 77; 1087 (1980)], phenylalanine 
hydroxylase [McDonald et al PedjaVr t Res 23:63 (1988)], 
ana* carriornte- araiyaraw XT" fifewiV- «cl t^, frjftVt nVwV 
Sci. USA 85:1962, (1988)]. 

B. Transgenics A normal or mutant version of CFTR 
or mCFTR can be inserted into the mouse germ line using 
now standard techniques of oocyte injection [Camper, 

10 Trends in Genetics (1988)]; alternatively, if it is 
desirable to inactivate or replace the endogenous mCF 
gene, the homologous recombination system using 
embryonic stem (ES) cells [Capecchi, Science 244:1288 
(1989)] may be applied. 

15 l. oocyte Injection Placing one or more 

copies of the normal or mutant mCF gene at a random 
location in the mouse germline can be accomplished by 
microinjection of the pronucleus of a just-fertilized 
mouse oocyte, followed by reimplantation into a pseudo- 

20 pregnant foster mother. The liveborn mice can then be 
screened for integrants using analysis of tail DNA for 
the presence of human CF gene sequences. The same 
protocol can be used to insert a mutant mCF gene. To 
generate a mouse model, one would want to place this 

25 transgene in a mouse background where the endogenous mCF 
gene has been inactivated, either by mutagenesis (see 
above ) or by homologous recombination (see below) . The 
transgene can be either: a) a complete genomic 
sequence, though the size of this (about 250 kb) would 

30 require that it be injected as a yeast artificial 
chromosome or a chromosome fragment; b) a cDNA with 
either the natural promoter or a heterologous promoter; 
c) a "minigene" containing all of the coding region and 
various other elements such as introns, promoter, and 3' 

35 flanking elements found to be necessary for optimum 
expression. 
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This alternative involves inserting the CFTR or mCF 
g«ne into a retroviral vector and directly infecting 
mouse embroyos .at early stages of development generating 
5 a chimera [Soriano et al au 46:19 (1986)]. At least 
some of these will lead to germline transmission. 

The embryonic stem cell approach (Capecchi, and 
Capecchi, Trendy fifflft 5:70 (1989)] allows the 
10 possibility of performing gene transfer and then 

screening the resulting totipotent cells to identify the 
rare homologous recombination events, once identified 
these can be used to generate chimeras by injection of' 
mouse blastocysts, and a proportion of the resulting 
15 mice will show germline transmission from the 

recombinant line. There are several way. this could be 
useful in the generation of a mouse model for CP: 

a) Inactivation of the mCF gene can be 
conveniently accomplished by designing a DNA fragment 
20 which contains sequences from a mCFTR exon flanking a 

selectable marker such as Homologous recombination 

will lead to insertion of the ne* sequences in the 
middle of an exon, inactivating mCFTR. The homologous 
recombination events (usually about 1 in lOOO) can be 
25 recdgnized from the heterologous ones by DNA analysis of 
individual clones [usually using PCR, Kim et al Hycleic. 

16:8887 (1988), Joyner et al 3 38:l53 
(1989) ; zimmer et al supra, p . 150 ] or by using a 
negative selection against the heterologous events [such 
30 a. the use of an HSV TK gene at the end of the 
construct, followed by the gancyclovir selection, 
Mansour et al, Nj^uxfi 336:348 (1988)]. This inactivated 
mCFTR mouse can then be used to introduce a mutant CF 
gene or mCF gene containing the F508 abnormality oi any 
35 other desired mutation. 
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b) It is possible that specific mutants of mCFTR 
cna be created in one step. For example, one can make 
a construct containing mCF intron 9 sequences at the 5' 
end, a selectable neo gene in the middle , and intro 9 + 
5 exon 10 (containing the mouse version of the F508 
mutation) at the 3' end* A homologous recombination 
event would lead to the insertion of the n&S. gene in 
intron 9 and the replacement of exon 10 with the mutant 
version. 

10 c) If the presence of the selectable neo marker in 

the intron altered expresson of the mCF gene, it would 
be possible to excise it in a second homologous 
recombination step. 

d) It is also possible to create mutations in the 

15 mouse germline by injecting oligonucleotides containing 
the mutation of interest and screening the resulting 
cells by PGR. 

This embodiment of the invention has considered 
primarily a mouse model for cystic fibrosis. Figure 4 

20 shows cross-species hybridization not only to mouse DNA, 
but also to bovine, hamster and chichen DNA. Thus, it 
is contemplated that an orthologous gene will exist in 
many other species also. It is thus contemplated that 
it will be possible to generate other animal models 

25 using similar technology. 

Although preferred embodiments of the invention 
have been described herein in detail, it will be 
understood by those skilled in the art that variations 
may be made thereto without departing from the spirit of 

30 the invention or the scope of the appended claims. 
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1. A DNA molecule comprising an intronless DNA 
sequence selected from the group consisting of: 
5 (a) DNA sequences which correspond to the DNA 

sequence of Figure 1 from amino acid residue position 1 
to position 1480; 

(b) DNA sequences encoding normal CFTR polypeptide 
having the sequence according to Figure 1 for amino acid 

10 residue positions from 1 to 1480; 

(c) DNA sequences which correspond to a fragment 
of the sequence of Figure 1 including at least 16 
sequential nucleotides between amino acid residue 
positions 1 and 1480; 

15 (d) DNA sequences which comprise at least 16 

nucleotides and. encode a fragment of the amino acid 

sequence of Figure 1; and 

(e) DNA sequences encoding an epitope encoded by 

at least 18 sequential nucleotides in the sequence of 
20 Figure 1 between amino acid residue positions l and 

1480.. 



2. The DNA molecule of claim 1 wherein the DNA 
molecule is a cDNA molecule. 

25 

3. A purified CF gene comprising a DNA sequence 
encoding an amino acid sequence for a protein, said 
protein, if expressed in its altered, defective or non- 
functional form in cells of the human body, being 

30 associated with altered cell function which correlates 
with the genetic disease, cystic fibrosis. 



35 



4. A purified RNA molecule comprising an RNA sequence 
corresponding to the DNA sequence recited in claiar 1. 
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5. A purified nucleic acid probe comprising a DNA or 
SNA nucleotide sequence corresponding to the sequence 
recited in parts (c) , (d) and (e) of claim 1. 

5 6. A nucleic acid probe according to claim 5 wherein 
said sequence comprises AAA GAA AAT ATC ATC TTT GGT GTT-, 
and its complement. 

7. A recombinant cloning vector comprising the DNA 
10 molecule of claim 1. 

8. A recombinant cloning vector comprising the DNA 
molecule of claim 3. 

15 9. The vector of claim 7 or 8 wherein said DNA 

molecule is operatively linked to an expression control 
sequence in said recombinant DNA molecule so that normal 
CFTR polypeptide can be expressed, said expression 
control sequence being selected from the group 

20 consisting of sequences that control the expression of 
genes of prokaryotic or eukaryotic cells and their 
viruses and combinations thereof. 

10. The vector of claim 9 wherein the expression 
25 control sequence is selected from the group consisting 
of the la£ system, the £rp. system, the tac system, the 
tXB system, major operator and promoter regions of phage 
lambda, the control region of fd coat protein, the early 
and late promoters of SV40, promoters derived from 
30 polyoma, adenovirus, retrovirus, baculovirus and simian 
virus, the promoter for 3-phosphoglycerate kinase, the 
promoters of yeast acid phosphatase, the promoter of the 
yeast alpha-mating factors and combinations thereof. 



35 



11. A host transformed with the vector according to 
claim 7 or 8. 
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12. The host of claim 11 selected from the group 
consisting of strains of E. coli , Pseudomonas, Bacillus 
subtilis . Bacillus stearothermophiiu"*, or other bacili; 
other bacteria; yeast; fungi; insect; mouse or other 
animal; or plant hosts; or human tissue cells. 

13. The host of claim 12 wherein said human tissue 
cells are human epithelial cells. 



14. A method for producing a normal CFTR polypeptide 
comprising the steps of: 

(a) culturing a host cell transfected with the 
vector of claim 13 in a medium and under conditions 

15 favorable for expression of normal CFTR polypeptide; and 

(b) isolating the expressed normal CFTR 
polypeptide. 

15. A purified mutant CF gene comprising a DNA sequence 
20 encoding an amino acid sequence for a protein, said 

protein, being associated with altered cell function 
which correlates with the genetic disease, cystic 
fibrosis. 

25 16. A purified mutant CF gene comprising a DNA sequence 
encoding an amino acid sequence for a protein, said 
^nrotAin- when ji^nresaed _in .its altered, def ective jot 
non- functional form in cells of the human body, being 
associated with altered cell function which correlates 

30 with the genetic disease, cystic fibrosis. 

17. A DMA molecule comprising an intronless DNA 
sequence encoding a mutant CFTR polypeptide 
characterized by cystic f ibrosis-associated activity in 
35 mammalian epithelial cells. 
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18. A DNA molecule comprising an intronless DNA 
sequence encoding a mutant CFTR polypeptide having the 
sequence according to Figure 1 for amino acid residue 
positions 1 to 1480, further characterized by a three 

5 base pair deletion which results in the deletion of 
phenylalanine from amino acid residue position 508. 

19. A DNA molecule comprising an intronless DNA 
sequence selected from the group consisting of: 

10 (a) DNA sequences which correspond to the sequence 

of claim 17 or 18 and which encode, on expression, for 
mutant CFTR polypeptide; 

(b) DNA sequences which correspond to a fragment 
of the sequences in claim 17 or 18 including at least 16 

15 nucleotides ; 

(c) DNA sequences which comprise at least 16 
nucleotides and encode a fragment of the amino acid 
sequence of claim 17 or 18; and 

(d) DNA sequences encoding an epitope encoded by 
20 at least 18 sequential nucleotides in the sequence of 

claim 17 or 18. 

20. The DNA molecule of claim 17 wherein the DNA 
molecule is a cDNA. 

25 

21* The DNA molecule of claim 18 wherein the DNA 
molecule is a cDNA. 

22. The DNA molecule of claim 19 wherein the DNA 
30 molecule is a cDNA. 

23. A purified SNA molecule comprising an RNA sequence 
corresponding to the DNA sequence recited in claim 19. 

35 24. A purified nucleic acid probe comprising a DNA or 
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FNA nucleotide sequence corresponding to the sequence 
r«cited in parts (b) , (c) , or (d) of claim 19. 

25. A nucleic acid probe according to claim 24 wherein 
5 said sequence comprises AAA GAA AAT ATC ATT GGT GTT, and 

its complement. 

26. A recombinant cloning vector comprising the DNA 
molecule of claim 19. 

10 

27. The vector of claim 26 wherein said DNA molecule is 
operatively linked to an expression control sequence in 
said recombinant DNA molecule so that mutant CFTR 
polypeptide can be expressed, said expression control 

15 sequence being selected from the group consisting of 
sequences that control the expression of genes of 
prokaryotic or eukaryotic cells and their viruses and 
combinations thereof. 

20 28. The vector of claim 27 wherein the expression 

control sequence is selected from the group consisting 
of the las system, the trp system, the tac • system f the 
£££ system, major operator and promoter regions of phage 
lambda, th« control region of fd coat protein, the early 

25 and late promoters of SV40, promoters derived from 

polyoma, adenovirus, retrovirus, baculovirus and simian 
virus, the promoter for 3-phosphoglycerate kinase, the 
promoters of yeast acid phosphatase, the promoter of the 
yeast alpha-mating factors and combinations thereof. 

30 

29. A host transformed with the vector according to 
claim 26. 

30. The host of claim 29 selected from the group ^ 

35 consisting of strains of E. coli. Pseudomonas . Bacillus 
?\rtrtiUs, BaclUqs stearothe raophilus . or other bacili; 
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other bacteria; yeast; fungi; insect; mouse or other 
animal; plant hosts; or human tissue cells. 

31. The host of claim 30 wherein said human tissue 
5 cells are human epithelial cells. 

32. A method for producing a mutant CFTR polypeptide 
comprising the steps of: 

(a) culturing a host cell transfected by the 
10 vector of claim 20 in a medium and under conditions 

favorable for expression of mutant CFTR polypeptide; and 

(b) isolating the expressed mutant CFTR 
polypeptide. 

15 33. A purified normal CFTR polypeptide characterized by 
a peptide molecular weight of about 170,000 daltons and 
cell transmembrane ion conductance affecting activity. 

34. A purified normal CFTR polypeptide characterized by 
20 a peptide molecular weight of about 170,000 daltons and 

epithelial cell transmembrane ion conductance affecting 
activity. 

35. A normal CFTR polypeptide substantially free of 
25 other human proteins and encoded by the ONA sequence 

recited in claim 1. 

36. A polypeptide coded by expression of a DNA sequence 
recited in claim 1, said polypeptide displaying the 

30 immunological or biological activity of normal CFTR 
polypeptide. 

37. A substantially pure normal CFTR polypeptide 
according to claim 35 made by chemical or enzymatic 

35 peptide synthesis. 
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38. The protein of claim 37 wherein fragments thereof 
are prepared by chemical synthesis techniques. 

39. A protein fragment comprising a portion of said 
5 amino acid sequence of claim 38. 

40. A substantially pure CFTR protein and homologues 
thereof, normally expressed in human epithelial cells 
and characterized by being capable of participating in 

10 regulation and control of ion transport through 

epithelial cells by binding to epithelial cell membrane 
to modulate ion movement through channels formed in 
epithelial cell membrane. 

15 41. A protein of claim 40 wherein said protein and 
homologues thereof have a molecular weight of about 
170,000 daltons. 

42. A protein of claim 40 wherein said protein has two 
20 repeated motifs, each motif comprising a set of amino 
acid residues capable of spanning an epithelial cell 
membrane several times followed by an amino acid 
sequence constituting a nucleotide (ATP) -binding fold. 

25 43. A protein of claim 42 wherein each of said set of 
amino acid residues comprises six highly hydrophobic 
segments capable of spanning a lipid bilayer of an 
epithelial cell membrane. 

30 44. A protein of claim 42 wherein an amino acid 

deletion is present in the first of said nucleotide 
(ATP) -binding folds of said two repeated motifs from the 
N-terminal of said protein, said deletion being 
phenylalanine. 

35 
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45. A protein of claim 42 wherein between said two 
repeated motifs is a highly charged cytoplasmic domain. 

46. A protein of claim 45 wherein -said protein has a 
5 molecular weight of approximately 170,000 daltons. 

s47. A protein of claim 36 isolated and purified from 
epithelial cells of a mammal not affected by cystic 
fibrosis. 

10 

48. A process for isolating said CFTR protein of claim 
47 comprising: 

(a) extracting peripheral proteins from membrane 
of epithelial cells to provide membrane material having 

15 integral proteins including said CFTR protein; 

(b) solubilizing said integral proteins of said 
membrane material to form a solution of said integral 
proteins ; 

(c) separating said CFTR protein to remove any 
20 remaining other proteins of mammalian origin. 

49. A process of claim 48 wherein said mammal is human, 
bovine, pig, sheep, horse, mouse, rat, hamster, or 
rabbit. 



25 



30 



35 



50. A process for isolating said CFTR protein of claim 
47 comprising: 

(a) solubilizing protein of epithelial cell 
membrane in which said CFTR protein is expressed, to 
provide a solution of said CFTR protein; 

(b) separating said CFTR protein from said 
solution by contacting said solution with antibodies to 
said CFTR protein, said antibodies being immobilized on 
a substrate; 

(c) rinsing said substrate to remove protein not 
adhered to said antibodies; 
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(d) releasing said CFTR protein from said 
antibodies to isolate thereby said CFTR protein, and 

(e) purifying said CFTR protein to remove any 
remaining other mammalian protein* 

5 

51. A process of isolating the CFTR protein of claim 36 
from cells containing said protein, comprising the steps 
of: 

(a) solubilizing protein of cell membrane in which 
10 said CFTR protein is expressed, to provide a solution of 

said CFTR protein; 

(b) separating said CFTR protein from said 
solution by contacting said solution with antibodies to 
said CFTR protein, said antibodies being immobilized on 

15 a substrate; 

(c) rinsing said substrate to remove protein not 
adhered to said antibodies; 

(d) releasing said CFTR protein from said 
antibodies to isolate thereby said CFTR protein, and 

20 (e) purifying said CFTR protein to remove any 

remaining other mammalian protein. 

52. A purified protein of human cell membrane origin 
comprising an amino acid sequence encoded by said mutant 

25 DNA sequence of claim 15 or 16, said protein, when 

present in human cell membrane, being associated with 
altered cell function which correlates with the genetic 
disease, cystic fibrosis. 

30 53. A purified mutant CFTR polypeptide characterized by 
cystic fibrosis-associated activity in human epithelial 
cells. 

54. A mutant CFTR polypeptide substantially free of 
35 other human proteins and encoded by the DNA sequence 
recited in claim 19. 
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55. A substantially pure mutant CFTR polypeptide 
according to claim 54 made by chemical or enzymatic 
peptide synthesis. 

5 

56. A polypeptide coded for by expression of a DNA 
sequence recited in claim 19. 

57. A purified protein fragment comprising a portion of 
10 said amino acid sequence of claim 52. 

58 . A process of isolating the mutant CFTR protein of 
claim 56 from cells containing said protein, comprising 
the steps of; 

15 (a) solubilizing protein of cell membrane in which 

said mutant CFTR protein is expressed, to provide a 
solution of said mutant CFTR protein; 

(b) separating said mutant CFTR protein from said 
solution by contacting said solution with antibodies to 

20 said mutant CFTR protein, said antibodies being 
immobilized on a substrate; 

(c) rinsing said substrate to remove protein not 
adhered to said antibodies; 

(d) releasing said mutant CFTR protein from said 
25 antibodies to isolate thereby said mutant CFTR protein, 

and 

(e) purifying said mutant CFTR protein to remove 
any remaining other mammalian protein.. 

30 59. A method for screening a subject to determine if 

said subject is a CF carrier or a CF patient comprising 

the steps of: 

providing a biological sample of the subject to be 

screened; and providing an assay for detecting in the 
35 biological sample, the presence of at least a member 

from the group consisting of the normal CF gene, normal 
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CF gene products, a mutant CF gene, mutant CF gene 
products and mixtures thereof. 

60. The method of claim 59 wherein the biological 
5 sample includes at least part of the genome of the 

subject and the assay comprises an hybridization assay. 

61. The method of claim 60 wherein the assay further 
comprises a labelled nucleotide probe according to claim 

10 5. 

62. The method of claim 60 wherein the assay further 
comprises a labelled nucleotide probe according to claim 
24. 

15 

63. The method of claim 61 wherein said probe 
comprises the nucleotide sequence of claim 6. 

64. The method of claim 62 wherein said probe comprises 
20 the nucleotide sequence of claim 25. 

65. The method of claim 59 wherein the biological 
sample includes a CFTR polypeptide of the subject and 
the assay comprises an immunological assay. 

25 

66. The method of claim 65 wherein the assay further 
includes an antibody specific for the normal CFTR 
polypeptide* 

30 67. The method of claim 65 wherein the assay further 
includes an antibody specific for a mutant CFTR 
polypeptide. 

68. The method of claim 65 wherein the assay is a 
35 radioimmunoassay. 
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69. The method of claim 66 wherein the antibody is at 
least one monoclonal antibody. 

70. The method of claim 67 wherein' the antibody is at 
5 least one monoclonal antibody. 

71. The method of claim 59 wherein the subject is a 
human fetus in utero. 

10 72. The method of claim 61 wherein the assay further 
includes at least one additional nucleotide probe 
according to claim 5. 

73. The method of claim 62 wherein the assay further 
15 includes at least one additional nucleotide probe 

according to claim 24. 

74. The method of claim 72, wherein the assay further 
includes a second nucleotide probe comprising a 

20 different DNA sequence fragment of the DNA of Figure 1 
or its RNA homologue or a different DNA sequence 
fragment of human chromosome 7 and located to either 
side of the DNA sequence of Figure 1. 

25 75. The method of claim 73 wherein the assay further 
includes a second nucleotide probe comprising a 
different DNA sequence fragment of the DNA of Figure l 
or its SNA homology or a different DNA sequence 
fragment of human chromosome 7 and located to either 

30 side of the DNA sequence of Figure 1. 

76. In a process for screening a potential CF carrier 
or patient to indicate the presence of an identified 
cystic fibrosis mutation in the CF gene, said process 
35 including the steps of: 
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(a) isolating genomic DNA from said potential CF 
carrier or said potential patient; 

(b) hybridizing a DNA probe onto said isolated 
genomic DNA, said DNA probe spanning said mutation in 

5 said CF gene wherein said DNA probe is capable of 
detecting said mutation; 

(c) treating said genomic DNA to determine 
presence or absence of said DNA probe and thereby 
indicating in accordance with a predetermined manner of 

10 hybridization, the presence or absence of said cystic 
fibrosis mutation. 

77. A process for detecting cystic fibrosis carriers or 
patients wherein said process consists of determining 
15 the presence or absence of a restriction endonuclease 
site in the mutant CF gene. 



78. A process for detecting cystic fibrosis carriers 
wherein said process consists of determining 

20 differential mobility of heteroduplex PGR products in 
polyacrylamide gels as a result of insertions or 
deletions in the mutant gene. 

79. A kit for assaying for the presence of a CF gene by 
25 immunoassay comprising: 

(a) an antibody which specifically binds to a gene 
product of the CF gene; 

(b) reagent means for detecting the binding of the 
antibody to the gene product; and 

30 (c) the antibody and reagent means each being 

present in amounts effective to perform the immunoassay. 

80. The kit of claim 79 wherein said reagent means for 
detecting binding is selected from the group consisting 

35 of fluorescence detection, radioactive decay detection, 
enzyme activity detection or colorimetric detection. 



WO 91/02796 



125 



PCT/CA90/00267 



81. The kit of claim 79 wherein said CF gene is the 
normal CF gene. 

5 82. The kit of claim 79 wherein said CF gene is the 
mutant CF gene. 

83. A kit for assaying for the presence of a CF gene by 
hybridization comprising: 

10 (a) an oligonucleotide probe which specifically 

binds to the CF gene; 

(b) reagent means for detecting the hybridization 
of the oligonucleotide probe to the CF gene; and 

(c) the probe and reagent means each being present 
15 in amounts effective to perform the hybridization assay. 

84. The kit of claim 83 wherein said CF gene is the 
normal CF gene. 

20 85. The kit of claim 83 wherein said CF gene is the 
mutant CF gene. 

86. An immunologically active anti-CFTR polyclonal or 
monoclonal antibody specific for CFTR polypeptide as 

25 recited in claims 34 or 38. 

87. A hybridoma producing a monoclonal antibody 
specific for CFTR polypeptide as recited in claim 34 or 
38. 

30 

88. A method of treatment for cystic fibrosis in a 
patient comprising the step of administering to the 
patient a therapeutically effective amount of the 
protein of claim 35. 



35 
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89. A method according to claim 88 wherein said protein 
is administered by: 

(a) combining said CFTR polypeptide with a lung 
surfactant protein? and 
5 (b) applying the combination of step (a) to 

respiratory epithelial cells, 

90. A method of gene therapy for cystic fibrosis 
comprising the step of delivery, to a cell of a cystic 

10 fibrosis patient, a DNA molecule according to claim 1. 

91. The method of claim 90 wherein the step of delivery 
further comprises the step of providing a vehicle for 
delivery. 

15 

92. The method of claim 91 wherein the vehicle is a 
recombinant vector. 

93. An animal comprising a heterologous cell system 
20 comprising a recombinant cloning vector of claim 26 

which induces cystic fibrosis symptoms in said animal. 

94. The animal of claim 93 wherein said animal is a 
mammal . 

25 

95. The animal of claim 94 wherein said mancmal is a 
rodent. 

94. The animal of claim 95 wherein said rodent is a 
30 souse. 

97. A transgenic mouse exhibiting cystic fibrosis 
symptoms • 
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ft FIG.1. 

1 AATTGGAAGCAAATGACATCACAGCAGGTCAGAGAAAAAGGGTTGAGCGGCAGGCACCCA 

61 GAGTAGTAGGTCTTTGGCATTAGGAGCTTGAGCCCAGACGGCCCTAGCAGGGACCCCAGC 

MQRSPLEKASVVSKLF 16 
121 GCCCGAGAGACCATGCAGAGGTCGCCTCTGGAAAAGGCCAGCGTTGTCTCCAAACTTTTT 

FdHTRP IL RKGYRQRLELSD 36 
181 TTCAqCTGGACCAGACCAATTTTGAGGAAAGGATACAGACAGCGCCTGGAATTGTCAGAC 

I Y Q I psVD SADNLSEKLEhE 56 
24 1 ATATACCAAATCCCTTCTGTTGATTCTGCTGACAATCTATCTGAAAAATTGGAAAOAGAA 

WDRELASKKNPKLINAL R R C 76 
301 TGGGATAGAGAGCTGGCTTCAAAGAAAAATCCTAAACTCATTAATGCCCTTCGGCGATGT 

F F W w lFHFYGTF T.YL « I B V T K A I 96 
361 TTTTTCTGGAGATTTATGTTCTATGGAATCTTTTTATATTTAGGGpAAGTCACCAAAGCA 

I V O P T. T, L I GRIIASYDPDNKEE 116 
421 GTACAGCCTCTCTTACTGGGAAGAATCATAGCTTCCTATGACCCGGATAACAAGGAGGAA 

P lfiTATVLCTGLC T.T. FTVR T~L1 136 
481 CGCTCTATCGCGATTTATCTAGGCATAGGCTTATGCCTTCTCTTTATTGTGAGGACACTG 

IT, l I HPAIFGLHHIGMQMR1AM 1S6 

54 1 ctcctacacccagccatttttggccttcatcacattggaatgcagatgagaatagctatg 

fsliykkltlklssrvldkis 176 
601 tttagtttgatttataagaagKctttaaagctgtcaagccgtgttctagataaaataagt 

igqlvsllsnnlnkfde id l a i 196 
661 attggacaacttgttagtctcctttccaacaacctgaacaaatttgatgaaggacttgca 

l-f.AHF VWTAPLOVA-LT.M6L ~71 H 216 
721 TTGGCACATTTCGTGTGGATCGCTCCTTTGCAAGTGGCACTCCTCATGGGGCTAATCTGG 

ELL nlA SAFCGLGFLIVT, ALJEI 236 
781 GAGTTGTTACAGGCGTCTGCCTTCTGTGGACTTGGTTTCCTGATAGTCCTTGCCCTTTTT 

I o A G T. G I RMHMKYRDQRAGKIS 256 
84 1 CAGGCTGGGCTAGGGAGAATGATGATGAAGTACAGAGATCAGAGAGCTGGGAAGATCAGT 

E RLVITSEMIENIQSVKAYC 276 
901 GAAAGACTTGTGATTACCTCAGAAATGATTGAAAATATCCAATCTGTTAAGGCATACTGC 

WEEAMEKMIENLRdTELKLT 296 
961 TGGGAAGAAGCAATGGAAAAAATGATTGAAAACTTAAGACAhACAGAACTGAAACTG ACT 

RKAAYVRYFN slSAFFFSGFFl 316 
1021 CGGAAGGCAGCCTATGTGAGATACTTCAATAGCTCAGCCTTCTTCTTCTCAGGGTTCTTT 

i V V P I. S V L P V A T. Tl H G \( I T. R IK l] 336 
1081 GTGGTGTTTTTATCTGTGCTTCCCTATGCACTAATCAAAGGAATCATCCTCCGGAAAATA 

I F T T I SFC TVLRMAVl T R Q F P W 356 
1141 TTCACCACCATCTCATTCTGCATTGTTCTGCGCATGGCGGTCACTCGGCAATTTCCCTGG 

AVQTWYDSLGAINK I Q I D F L Q 376 
1201 GCTGTAC AAAC ATGGTATG ACTCTCTTGGAGCAATAAACAAAATACAGC ATTTCTTACAA 

KQEY KTLEYNLTTTEVVMEN 396 
l 9ci n *rim »r:»tT!Ti anAOATTGGA ATATAACTTAACGACTACACA AGTAGTGATGGAGAAT 
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FIG. 1 (cont'd) 



0 



n««*ii>H<iDDSLFFSNrsTT 
1381 AACAATAGAAAAACTTCTAATGGTGATGACAGCCTCTTCTTCAGTAA TTTCTCACTICTT 



1441 



C TPVLKD IWrxiBRCQ T, LAV 

GGTACTCCTGTCCTGAAAGATATTAATTTCAAGATAGAAAGAGGACAGTTGTTGGCGGTT 



1561 
1621 



456 



AGSTCAG XlTSLLMMIMGKT. E 476 
1501 GCTGGATCCACTGGAGCAGGCAADACTTCACTTCTAATGATGATTATGGGAGAACTGGAG 

-2 — § — g — G — X I XHSGRlsrc SOT S~U 4 96 

CCTTCAGAGGGTAAMTTAAGCACAGTGGAAGA ATTTCATTCTGTTCTCAGTTTTCCTGG 

IKPGTIX EKII^GVfl Y D E Y R 516 

ATTATGCCTGGCACCATTAAAGAAAATATCATCTT TGGTGTTTCCTATGATGAATATAGA 



YRSVIKACQLEE Idt a ir A - - „, 

1661 TACAGMGCGTCATCAAAGCATGCCAACTAGAAGA GGACATCTCCAAGTTTGCAGAGAAA 

DWIVLGEGGITT, SGCot> ? R T S5 , 
1741 GACAATATAGTTCTTGGAGAAGGTGGAATCACAC TGAGTGGAGGTCAACGAGCAAGAATT 

3 LARIaV YXDADLYIiLD a pre 576 

1601 TCTTTAGCAAG&GCAGTATACAAAGATGCTGATTTGTATTTATTAGACTCTCCTTTTGGA 

• 

,„„ yi.DVI.TEK EIFEdcVCKLMA 596 
1861 TACCTAGATGTTTTAACAGAAAAAGAAATATTTGAAAGpTGTGTCTGTAAACTGATGGCT 

,„„ NKTRILVTSKMEHLKKADKI 616 
1921 AACAAAACTAGGATTTTGGTCACTTCTAAAATGGAACATTTAAAGAAAGCTGACAAAATA 

LILNEGSSYFYGTFSELONL 636 
1981 TTAATTTTG AATGAAGGTAGCAGCTATTTTTATGGGACATTTTCAGAACTCCAAAATCTA 

QPDFSSKLMGCDSFDQFS A E 656 
2041 CAGCCAGACTTTAGCTCAAAACTCATGGGATGTGATTCTTTCGACCAATTTAGTGCAGAA 

RRNS1LTETLHRFSLEGDAP 676 

2101 AGAAGAAATTCAATCCTAACTGAGACCTTACACCGTTTCTCATTAGAAGGAGATGCTCCT 

VSWTETKKQS°FKQTG EFGEK 696 
2161 GTCTCCTGGACAGAAACAAAAAAAC AATCTTTTAAACAGACTGGAGAGTTTGGGGAAAAA 

RKNilLNPINSlRKFllV QK 716 
2221 AGGAAGAATTCTATTCTCAATCC AATCAACTCTATACGAAAATTTTCCATTGTGCAAAAG 

TPLQMNGIEEDSDEPLERRL 736 
2281 ACTCCCTTACAAATGAATGGCATCGAAGAGGATTCTGATGAGCCTTTAGAGAGAAGGCTG 

SLVPDSEQGEAILPRISVI S 756 
2341 TCCTTAGTACCAGATTCTGAGCAGGGAGAGGCGATACTGCCTCGCATCAGCGTGATCAGC 

TGPTLQARRRQSVLNLMTH S 776 
2401 ACTGGCCCCACGCTTCAGGCACG AAGGAGGC AGTCTGTCCTGAACCTGATGACACACTCA 

VNQGQNI HRK.TTASTRKVSL 796 
2461 GTTAACCAAGGTCAGAACATTCACCGAAAGACAACAGCATCCACACGAAAAGTGTCACTG 

APQANLTELDIYSRRLSQET 816 
2521 GCCCCTCAGGCAAACTTGACTGAACTGGATATATATTCAAGAAGGTTATCTCAAGAAACT 
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FIG.1 (cont'd) 

GLEISEEINEEDLhdECLFDD 836 
2581 GGCTTGGAAATAAGTGAAGAAATTAACGAAGAAGACTTAAACG AGTGCCTTTTTGATGAT 

MESIPAVTTWNTYLRY ITVH 856 
2641 ATGGAGAGCATACCAGCAGTGACTACATGGAACACATACCTTCGATATATTACTGTCCAC 

K S LlTPVT.TWCT, VTPT.AF.VaaI 876 
2701 AAGAGCTTAATTTTTGTGCTAATTTGGTGCTTAGTAATTTTTCTGGCAGAGGTGGCTGCT 

IS L vTI LWLLGNJTPLQDKgTsT 896 
2761 TCTTTGGTTGTGCTGTGGCTCCTTGGAAAfcACTCCTCTTCAAGACAAAGGGAATAGTACT 

HSrTnsYAVI ITSTS I S Y Y V F I 916 
2821 CATAGTAGAAATAACAGCTATGCAGTGATTATCACCAGCACCAGTTCGTATTATGTGTTT 

lY T YVGVAD T T. LA MG F~H R G L P 936 
2881 TACATTTACGTGGGAGTAGCCGACACTTTGCTTGCTATGGGATTCTTCAGAGGTCTACCA 

LVHTLITVSK I LHHKMLHSV 956 
2 941 CTGGTGCATACTCTAATCACAGTGTCGAAAATTTTACACCACAAAATGTTACATTCTGTT 

LQAPMSTLNTLKAfeGILNRF 976 
3001 CTTCAAGCACCTATGTCAACCCTCAACACGTTGAAAGCAGbTGGGATTCTTAATAGATTC 

SKDIAILDDLLPL T 1 I F D F I ol 996 
3061 TCCAAAGATATAGCAATTTTGGATGACCTTCTGCCTCTTACCATATTTGACTTCATCCAQ 

iLLLIVIGAIAVVAVLl Q P [ Y I f! 1016 
3121 TTGTTATTAATTGTGATTGGAGCTATAGCAGTTGTCGCAGTTTTACAACCCTACATCTTT 

IVATVPVYVAF T MLR AYFT.I Q T 1036 
3181 GTTGCAACAGTGCCAGTGATAGTGGCTTTTATTATGTTGAGAGCATATTTCCTCCAAACC 

SQQLKQLESEGRSP IFTHLV 1056 
3241 TCACAGCAACTCAAACAACTGGAATCTGAAGGCAGGAGTCCAATTTTCACTCATCTTGTT 

T S L K G L W T I# R A F G R Q p Y F'E T 1076 
3301 ACAAGCTTAAAAGGACTATGG ACACTTCGTGCCTTCGGACGGCAGCCTTACTTtGAAACT 

LFHKALNLH TANWFLYLSTL 1096 
3361 CTGTTCCACAAAGCTCTGAATTTACATACTGCCAACTGGTTCTTGTACCTGTCAACACTG 

R W F Q M Rll EM I FV T FF T AVTFl 1116 
3421 CGCTGGTTCCAAATGAGAATAGAAATGATTTTTGTCATCTTCTTCATTGCTGTTACCTTC 

1 I S I L T T [CD E G E G rtv.CTTLTLAl 1136 
34 81 ATTTCCATTTTAACAACAGjGAGAAGGAGAAGGAAGAGTTGGTATTATCCTGACTTTAGCC 

Imnt mstlowavnstI i d V D S L| 1156 
3541 ATGAATATCATGAGTACATTGCAGTGGGCTGTAAACTCCAGCATAGATGTGGATAGCTTE 

MRSVSRVFKF I DMPTEGKPT 1176 
3601 ATGCGATCTGTGAGCCGAGTCTTTAAGTTCATTG ACATGCCAACAGAAGGTAAACCTACC 

KSTKPYKNGQLSKVMI IE N S 1196 
3661 AAGTCAACCAAACCATACAAGAATGGCC AACTCTCG AAAGTTATGATTATTG AGAATTCA 

HVKKDDI W P SGGQMTVKDLT 1216 
3721 CACGTGAAGAAAGATGACATCTGGCCCTGAGGGGGCCAAATGACTGTCAAAGATCTCACA 

A K YTECGKAILEKISrsiSP 1236 
3781 GCAAAATACACAGAAGGTGGAAATGCCATATTAGAGAACATTTCCTTCTCAATAAGTCCT 



WO 91/02796 



4/32 



PCT/CA90/00267 



FIG.1. (cont'd) 

FLRLLKTECEIQIDCVSWDS 1276 
3901 TTTTTGAGACTACTGAACACTGAAGGAGAAATCCAGATCGATGGTGTGTCTTGGGATTCA 

ITLOO WRKAFGVIPQKVFXF 1296 
3961 ATAACTTTGCAACAGTGGAGGAAAGCCTTTGGAGTGATACCACAGAAAGTATTTATTTTT 

S. G T F R K K LDP Y g Q W S D Q g I W 1316 
4 02 1 TCTGGAACATTTAGAAAAAACTTGGATCCCTATGAACAGTGGAGTGATCAAGAAATATGG 

K V A D E|V GLRSVI. gQFPCXL D 1336 
4081 AAAGTTGCAGATGACJ3TTGGGCTCAGATCTGTGATAGAACAGTTTCCTGGGAAGCTTGAC 

FVLVPGG CVXtg HGHKQLMC L 1356 
4141 TTTGTCCTTGTGGATGGGGGCTGTGTCCTAAGCCATGGCCACAAGCAGTTGATGTGCTTG : 

ARSVLSXAXILLLDEPSAHL 137 6 
4201 GCTAGATCTGTTCTCAGTAAGGCGAAGATCTTGCTGCTTGATGAACCCAGTGCTCATTTG 

D P Vf T Y Q I IRR TLKQAFADCT 1396 
42 61 GATCCAGWACATACCAAATAATTAGAAGAACTCTAAAACAAGCATTTGCTGATTGCACA 

VILCEHRIEAMLECQQF L|V I 1416 
4321 GTAATTCTCTGTGAACACAGGATAGAAGCAATGCTGGAATGCCAACAATTTTTQ3TCATA 

EENKVRQYD SIQKLLNERSL 1436 
4381 GAAG AGAACAAAGTGCGGCAGTACGATTCC ATCCAGAAACTGCTGAACGAGAGG AGCCTC 

FRQAI SPSDRVKLFPHRNSS 1456 
4441 TTCCGGCAAGCCATCAGCCCCTCCGACAGGGTGAAGCTCTTTCCCCACCGGAACTCAAGC 

KCKSKP Q I AALKEETEEEVQ1476 
4501 AAGTGCAAGTCTAAGCCCCAGATTGCTGCTCTGAAAGAGGAGACAGAAGAAGAGGTGCAA 

D T R L « 1480 

4561 GATACAAGGCTTTAGAGAGCAGCATAAATGTTGACATGGGACATTTGCTCATGGAATTGG 

4 621 AGCTCGTGGGACAGTCACCTCATGGAATTGGAGCTCGTGGAACAGTTACCTCTGGCTCAG 

4 681 AAAACAAGGATGAATTAAGTTTTTTTTTAAAAAAGAAACATTTGGTAAGGGGAATTGAGG - 

4741 ACACTG ATATGGGTCTTG ATAAATGGCTTCCTGGCAATAGTCAAATTGTGTGAAAGGTAC 

4801 TTCAAATCCTTGAAGATTTACCACTTGTGTTTTGCAAGCCAGATTTTCCTGAAAACCCTT 

4861 GCCATGTGCTAGTAATTGGAAAGGCAGCTCTAAATGTCAATCAGCCTAGTTGATCAGCTT 

4921 ATTGTCTAGTGAAACTCGTTAATTTGTAGTGTTGGAGAAGAACTGAAATCATACTTCTTA 

4 981 GGGTTATGATTAAGTAATGATAACTGGAAACTTCAGCGGTTTATATAAGCTTGTATTCCT 

5041 TTTTCTCTCCTCTCCCCATGATGTTTAGAAACACAACTATATTGTTTGCTAAGCATTCCA 

5101 ACTATCTCATTTCCAAGCAAGTATTAGAATACCACAGGAACCACAAGACTGCACATCAAA 

5161 ATATGCCCCATTCAACATCTAGTGAGCAGTCAGGAAAGAGAACTTCCAGATCCTGGAAAT 

5221 CAGGGTTAGTATTGTCCAGGTCTACCAAAAATCTCAATATTTCAGATAATCACAATACAT 

5281 CCCTTACCTGGGAAAGGGCTGTTATAATCTTTCACAGGGGACAGGATGGTTCCCTTGATG 

5341 AAGAAGTTGATATGCCTTTTCCCAACTCCAGAAAGTGACAAGCTCACAGACCTTTGAACT 

54 01 AGAGTTTAGCTGGAAAAGTATGTTAGTGCAAATTGTCACAGGACAGCCCTTCTTTCCACA 

54 61 GAAGCTCCAGGTAGAGGGTGTGTAAGTAGATAGGCCATGGGCACTGTGGGTAGACACACA 

5521 TGAAGTCCAAGCATTTAGATGTATAGGTTGATGGTGGTATGTTTTCAGGCTAGATGTATG 

5581 TACTTCATGCTGTCTACACTAAGAGAGAATGAGAGACACACTGAAGAAGCACCAATCATG 

5641 AATTAGTTTTATATGCTTCTGTTTTATAATTTTGTGAAGCAAAATTTTTTCTCTAGGAAA 

5701 TATTTATTTTAATAATGTTTCAAACATATATTACAATGCTGTATTTTAAAAG AATGATTA 

57 61 TGAATTACATTTGTATAAAATAATTTTTATATTTGAAATATTGACTTTTTATGGCACTAG . 

5821 TATTTTTATGAAATATTATGTTAAAACTGGGACAGGGG AG AACCTAGGGTGATATTAACC 

5881 AGGGGCCATGAATCACCTTTTGGTCTGG AGGG AAGCCTTGGGGCTGATCGAGTTGTTGCC 

5941 CACAGCTGTATG ATTCCCAGCCAGAC ACAGCCTCTTAGATGCAGTTCTGAAG AAGATGGT 

6001 ACCACC AGTCTG ACTGTTTCCATC AAGGGTAC ACTGCCTTCTCAACTCCAAACTGACTCT 

6061 TAAGAAGACTGCATTATATTTATTACTGTAAGAAAATATCACTTGTCAATAAAATCCATA 

6121 CATTTGTGT (A) n 
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FIG/16, (cont'd) 
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